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PSYCHOLOGY IN JAPAN 
KOJI SATO ano C. H. GRAHAM 
Kyoto University and Columbia University 


This paper presents a survey of 
Japanese psychology, first, from a 
historical point of view, and second, 
from the point of view of describing 
some of its modern aspects. The occa- 
sion for organizing the paper arose in 
the summer of 1952 when the two 
authors were members of the Seminar 
in Experimental Psychology, one of 
the Kyoto Seminars in American 
Studies conducted under the auspices 
of Kyoto University, Doshisha Uni- 
versity, the University of Illinois, and 
the Rockefeller Foundation. Many of 
the ideas expressed and information 
contained in the account are products 
of the many discussions that went on 
among the Seminar participants 
along with the formal academic ac- 
tivities.’ 

The present survey attempts, with- 
in its limitations, to trace, in general 
outline and without claim to ex- 
haustiveness, certain historical pat- 
terns of Japanese psychology and 
their emergence into the modern 
stream of fact, theory, and practice. 
The account enumerates many re- 
searches that have been carried out 


by Japanese investigators, researches 


that are not usually well known in 
the West. It is hoped that the pres- 
ent description will stimulate inter- 
est in these contributions. 


1 We are greatly indebted to Mrs. Harouye 
Fukuhara of Doshisha University, Kyoto, for 
her generous help in accumulating materials 
on which this report is based. 


HiIstorY OF PSYCHOLOGY IN JAPAN 
Psychology until 1926 


Western psychology was _ intro- 
duced in Japan about 10 years after 
the beginning of the Meiji Era (1868— 
1912) against an ancient background 
of Indian, Buddhist, and Chinese: 
philosophy. Once the introduction 
was’ effected, translations were 
quickly made of books by Haven, 
Bain, Sully, Wundt, and Ladd. The 
first experimental researches of Japa- 
nese psychologists were performed in 
America and Germany in the last 
decade of the nineteenth century by 
such men as Motora (111), Matsu- 
moto (87, 88, 89, 90, 91), Nakajima 
(116), T. Okabe (142), and Kakise 
(43). 

Yujiro Motora (1858-1912) and 
Matataro Matsumoto (1865-1943) 
were the principal pioneers of Japa- 
nese psychology. Motora was the 
first professor of psychology in Tokyo 
University. He had a _ philosophic 
mind and tried to establish a system 
of psychology. Matsumoto was pri- 
marily an experimentalist. He de- 
signed the psychological laboratory 
of Tokvo University in 1903, after 
inspecting laboratories in America 
established a 
Kyoto University in 


and Germany, and 
laboratory at 
1907. He hecame the professor of 
psychology at Kyoto in 1906 and 
succeeded Motora at Tokyo after 


the latter’s death in 1913. He advo- 
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the 
matics,”’ a science of psychophysio- 
logical behavior, and became the 
father of applied experimental psy- 
chology in Japan. He survived 
Motora by 31 vears. Most of the 
older leaders of Japanese psychology 
are his students. 

The first 25 years of the twentieth 
constituted an 


cated study of ‘“‘psychocine- 


century orientation 
period in the methods of psychologi- 
cal research. Before 1925 studies ap- 
peared under the names of such re- 
searchers as Chiba (16, 17), K. Ma- 
suda (83, 84, 85), Yh. Kubo (64, 65, 
66), K. Tanaka (183, 184, 185), G. 
Kuroda (70, 71), Sakuma (154, 155, 
156, 157), Koga (58), Chiwa (18), and 
Narasaki (119). In 1911, Ohtsuki 
(138) published his Experimental 
Psychology, a book of 1,000 pages, 
which owes its material to Wundt, 
Titchener, and other German and 
American psychologists. The preva- 
lent schools of psychology in Japan 
at this time were Wundt’s appercep- 
tion psychology and American funce- 
tionalism. Students of structuralism, 
such as Yokoyama (218, 219) and 
Takagi (175), came at the end of this 
period, as did Yatabe, who had just 
returned from Piéron’s laboratory. 

Experimental studies had started 
in the psychological laboratories of 
Tokyo and Kyoto universities near 
the beginning of the century. These 
works were first published as mono- 
graphs, and it was not until the Japa- 
nese Journal of Psychology 
founded in 1919 at Kyoto by Genji 
Kuroda that a ready means of publi- 
cation became available. The edi- 
torial office was moved to Tokyo in 
1923. 

With the advent of better economic 
conditions after World War I the 
number of educational institutions in- 
creased greatly. Departments of psy- 
chology were established in Tohoku 


was 
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(Sendai), Kyushu (Fukuoka), Keijo 
(Seoul, Korea), Taihoku (Taipei, 
Formosa), Nihon, Hosei, Waseda, 
Keio (the latter four in Tokyo), 
Doshisha (Kyoto), Kwansei Gakuin 
University (Kobe), and the Tokyo 
and Hiroshima Colleges of Arts and 
Science. In addition, many psycholo- 
gists obtained positions in the Koto- 
gakko (essentially junior colleges). 

It is not surprising that the increase 
in Japanese educational facilities was 
accompanied by a quickened interest 
in educational psychology, an inter- 
est that was becoming strongly estab- 
lished by World War I. The mental 
test movement found a home in Ja- 
pan, but many of its technical aspects 
were not developed along the lines 
that prevailed in America. 

Studies of personality did not be- 
come important in number until the 
1930's, but it is important to note 
that Watanabe wrote a book on per- 
sonality (202) as early as 1912. In- 
dustrial psychology was recognized 
by some people as an important field 
of research during World War I, but, 
probably due to problems peculiar to 
Japan because of overpopulation, it 
did not develop (nor has it yet de- 
veloped) to the same important posi- 
tion it holds in the United States. 
In clinical psychology, the psychi- 
atric institutes of Tokyo and Kyoto 


universities participated in the men- 
tal test movement and devised psy- 
chodiagnostic methods, but only a 


few individual psychologists, e.g., 
Oguma (133), took an interest in 
problems of abnormal psychology. 
Kuwata (75) of Tokyo University 
introduced the folk psychology of 
Wundt, and Iritani (36) the social 
psychology of McDougall. 

In general, it may be said that de- 
spite the existence of certain scat- 
tered interests in other areas, psychol- 
ogy in Japan until 1926 was largely 
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dominated by educational psychology 
and classical experimental psychol- 
ogy 

One major contribution of great in- 
fluence that was to have an impor- 
tant bearing on later developments in 
Japanese psychology appeared in 
1925. In this year Matsumoto wrote 
his comprehensive Psychology of Intel- 
ligence (90), a book that dealt not 
only with intelligence, but also with 
mental work, environ- 
mental factors, efficiency, and mili- 
tary psychology. In addition it in- 
cluded a broad commentary on the 
general development of psychology. 
It involved a survey of research done 
by Japanese psychologists under his 
(Matsumoto’s) guidance as well as a 
consideration of related work done by 
western psychologists. It was a mile- 
stone in the development of Japanese 
applied psychology. 


senescence, 


From 1926 through World War II 


The beginning of the Showa Era 
(1926) marked the development of 
some important trends in Japanese 
psychology. At that time the influ- 
ence of gestalt psychology was be- 
ginning to be felt; the Japanese 
Psychological Association was es- 
tablished, as was the new series of the 
Japanese Journal of Psychology; and 
the number of graduates in psychol- 
ogy showed a remarkable increase 
due to the expanded system of higher 
education that developed after World 
War I. 

Sakuma (157) and Onoshima (146), 
who studied at Berlin University, 
were the chief expositors of gestalt 
psychology. Sakuma’s translation of 
Kohler’s Gestalt Psychology (154) is 
well known. It can safely be said 
that gestalt theory has colored the 
thought of most Japanese psycholo- 
gists from about 1926 to World War 
I] (and even now exerts a powerful 
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influence). The years around 1926 
also saw the introduction of the 
works of the Leipzig school (e.g., 
those of Krueger and Volkelt) by 
Iwai (40, 41) and others (159) of 
Kyoto University. It would, how- 
ever, be wrong to think that these in- 
fluences were unaccompanied at this 
time by original Japanese contribu- 
tions. Chiba proposed a theory of 
‘“*Eigenbewusstsein,’’ R. Kuroda the 
psychology of ‘‘comprehension,” and 
Sakuma a theory of ‘‘basic conscious- 
ness” as directly experienced. These 
contributions may be seen as an at- 
tempt to revise the traditional theory 
of consciousness in the light of some 
characteristics of oriental culture. 
Among books that appeared near 
1926, several deserve particular men- 
tion. K. Masuda’s Introduction to 
Experimental Psychology, a manual 
for beginners, appeared in 1926. The 
same author’s Methodology of Psychol- 
ogy, published in 1934, is an author- 
itative work in general psychology. 
Yh. Kubo compiled two volumes of 
his important IJandbook of Experi- 
mental Psychology during 1926 and 
1927, and Kido’s General Outline of 
Psychology appeared in 1931. 


Experimental Psychology 


What were some of the evident 
trends in experimental psychology 
from 1926 to World War II? 

Perception. In the years between 
1926 and 1939, the study of percep- 
tion progressed at a quickened pace, 
usually along lines dictated by gestalt 
psychology. A short enumeration 
of some of the studies undertaken 
may be used to indicate the areas of 
greatest interest. 

The Psychological Institute of Kyu- 
shu University published a series of 
experimental studies on the structure 
of perceptual space directed by 
Sakuma, Yatabe, and Akishige (8, 9, 
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95, 158, 182, 206). Takagi (176) 


compared the structural, phenome- 
nological, and gestalt views of per- 
ception. Obonai (30, 121, 125, 126, 
127) continued (and has to the pres- 
ent time) his studies of psycho- 
physical induction in perception, 
memory, and other areas. (Most of 
the contributions are in the area of 
visual perception, where Obonai has 
tried to formulate laws of induction 
as extensions of the laws of contrast 
and confluence.) 

Matsubayashi (86), an ophthal- 
mologist, performed his classical ex- 
periments on depth in 1937 and 1938. 
A number of investigations of the 
visual constancies were carried out. 
Akishige (8), Ibukivyama (27), and 
others worked on size constancy; 
Yt. Kubo (67) on shape constancy; 


and Ogasawara (130) on the con- 


stancy of phenomenal velocity. Nishi 
(121) studied the moon illusion. 
Studies on the perception of move- 


ment may be represented by the in- 
vestigations of Mukuno (113) on 
tactile apparent movement, S. Kubo 
(63) on visual apparent movement, 
Hisata (27) on auditory apparent 
movement, Fukutomi (21) on delta 
movement, and Ogawa (131) on the 
seen path of real movement. Studies 
on form perception were carried out 
by Morinaga (102), Hayami and 
Miya (25), Yagi (203), and others. 
Studies of the relation between time 
and space (the “‘tau effect”’ and its 
reverse) were performed by Abbe 
(1, 2) and by Suto (172). 

At this time Y. Wada (200) per- 
formed experiments on the time error 
for auditory stimuli. Other studies 
in audition were not extensive, but 
Yuki made an important contribu- 
tion when he published A Psychology 
of Tone (221) in 1933. In the field of 
gustatory sensitivity, Rikimaru (151) 
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made a survey of taste blindness 
among Japanese, Chinese, and For- 
mosan natives. 

The area of eidetic imagery was 
investigated by Ohwaki and Masaki 
(80) and the former (139) published 
his hook Eidetic Imagery of Children 
in 1935. 

The work in perception during this 
period was essentially the product of 
a relatively small number of Japanese 
psychologists. These psychologists 
were very active and contributed a 
number of results which are, unfor- 
tunately, not as well known in west- 
ern countries as they should be. 

Human learning, animal learning, 
and thinking. ‘The field of human 
learning was not so intensively in- 
vestigated from 1926 to 1939 as was 
the area of perception. The studies 
that were done were generally per- 
formed within the framework of 
gestalt psychology. For example, 
Ushijima (197) studied the achieve- 
ment problem (Erfolgsproblem) in 
learning, Amano (11) investigated 
‘‘memory traces”’ in the tradition of 
the Wulf experiment, and problems 
of retroactive inhibition, proactive 
inhibition, and reproductive inhibi- 
tion were taken up by Sagara (152, 
153) and Maeda (76). 

Researches in the field of animal 
behavior were sparse, but a few in- 
teresting investigations were under- 
taken. R. Kuroda published studies 
on the hearing of reptiles as early as 
1923. Thereafter he extended his re- 
searches to the monkey, white rat, 
and tortoise. He wrote a Psychology 
of Animals (74) in 1936. Takagi 
(178) studied the influence of back- 
ground upon the transposition of se- 
lective responses to brightness (in the 
varied tit) and investigated form 
constancy in the tomtit (177). The 
transposition of selective responses 
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was also studied by Ohtsuka (137) 
and Shirai (164). Odani (128), a psy- 
chologist and psychiatrist, studied 
the effects of cerebral lesions upon 
the visual and auditory discrimina- 
tion habits of white rats. Yamanou- 
chi, a biologist, published his Psy- 
chology of Animals in 1938. Hayashi 
(26), a student of Pavlov, pioneered 
in the study of conditioning. 

The general problem of thinking 
has been of great interest to Japanese 
psychologists. Kido (69), Mitui (96), 
and Tomeoka (189) of Hosei Uni- 
versity performed experimental ob- 
servations on the process of doubt 
and speculation. Sato (160), Abe 
(5), and Amano (12) analyzed the 
process of comparison by introspec- 
tive methods. Sato (160) has extended 
his interest in this general area to 
investigations of developmental as- 
pects of the comparison process. 


Personality 


In the thirties German character- 
ology was introduced into Japan, 
and some experiments (10) were per- 
formed within the framework of this 
viewpoint. These experiments con- 
stituted the first systematic work in 
the field of personality. For example, 
psychologists (7, 187) at 
University performed research in the 
context of Kretschmer’s typology, 
and Uchida (191), in particular, de- 
vised a continuous addition test, es- 
sentially a modification of Kraepel- 
in’s work. Uchida’s test is now in 
general use in Japan (190). During 
this period, Abe (6), Susukita (171), 
and others, under the guidance of 
Chiba, studied personality differences 
among Manchurian races. Masaki 
and Yoda (82) studied personality 
from the viewpoint of educational 
psychology and wrote A Psycholegy 
of Character in 1937. 


Waseda > 
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Educational and Clinical Psychology 


Educational psychology, one of the 
two oldest fields of psychological in- 
vestigation in Japan, showed consid- 
erable progress between 1926 and 
World War II. Studies of develop- 
ment, learning, personality, work, and 
fatigue increased greatly in number 
and had direct influences on educa- 
tion. 

The Institute for Child Welfare of 
Aiikukai and the Institute for Child 
Study in Nihon Women’s University 
played particularly important roles in 
researches in child psychology during 
the thirties. Tokyo (with Tanaka, 
Narasaki, Terazawa, and Takemasa 
{120, 181, 183]) and Hiroshima College 
of Arts and Science (with Kubo and 
Koga (57, 64, 66]) were centers of edu- 
cational psychology. Kyoto Uni- 
versity, with Nogami (123) in ado- 
lescent psychology, and Iwai, Kato, 
Sonohara, and Moriya in child psy- 
chology, was a center of develop- 
mental psychology. At this time Yh. 
Kubo (66) wrote his Child Psychology 
under the influence of Charlotte 
Bihler, and Hatano introduced Pia- 
get's work (24) to Japan. The main 
weakness of the child studies carried 
out in this period lay in the fact that 
no provisions were made for large- 
scale follow-up investigations. Inves- 
tigations in the areas of maturity and 
old age were done almost solely by 
Tachibana (174). 

The investigation of children’s per- 
sonalities was taken up in the period 
from 1920-1935. Intelligence tests 
and personality tests were not elab- 
orately developed or used in Japan 
until after Ohtomo, a student of 
Judd, wrote his two-volume Diag- 
nostics of Education (136) in the 
period 1928-1933. A revision of the 


Kinet-Terman tests of intelligence by 
ti. Suzuki (173), then a school in- 
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spector of Osaka City, has been in 
general use in Japan since 1930, and 
group intelligence tests were 
adapted for Japanese use by Tanaka 
(185), Kirihara (53), Awaji (13), and 
others. The only personality tests 
developed during this period were a 
test of extroversion-introversion by 
Awaji and Okabe (14), and a revision 
of the Downey Will-Temperament 
test by Kirihara. 

In the 1930's psychologists, child 
psychologists, educationists 
formed groups to study clinical and 
abnormal child psychology. Their 
studies mark the beginning of clinical 
psychology in Japan. Several child 
guidance clinics were established 
about 1930 (in Kyoto City, Kobe 
City, Hyogo Prefecture, Aichi Pre- 
fecture, and Tokyo City). At the 
same time a few clinics were estab- 
lished in prisons, courts, and reforma- 
tories under the Japanese Judicial 
Department and in guidance centers 
set up by the Welfare Department. 
The term clinical psychology was 
never used officially during this pe- 
riod, and the work of the few psy- 
chologists employed in the clinics was 
limited to diagnosis and the evalua- 
tion of 


also 


and 


intelligence and character. 
The work of the early clinicians and 
the work of the educational psychol- 
ogists overlapped, but their respec- 
tive efforts differed in one respect: the 
former extended their methods to 
meet the requirement of evaluating 
abnormal children and criminals. In 
a few cases the clinicians adopted 
methods typical of psychoanalysis 


(Marui [79] and others). 


Industrial Psychology 


This field, always at a disadvan- 
tage in Japan because of the ever- 
present overpopulation and its con- 
sequence, cheap labor, did not pro- 
gress as rapidly during the 1930's 
as some other areas of psychology. It 
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has been mentioned that, during 
World War I, attention was paid to 
industrial psychology and some sys- 
tematic studies of the labor problems 
were made subsequent to 1926 by 
members of the Kurashiki Institute 
of the Science of Labor, founded in 
1921 by M. Ohara. (The Institute 
moved to Tokyo in 1937 and has con- 
tinuc ' its work until the present 
time. It has included several psy- 
chologists on its staff, Kirihara and 
Ueno [52, 192], for example.) The 
Prefectural Institute of Industrial 
Psychology of Osaka was founded 
about 1925. By the middle of the 
thirties the Ministry of Welfare and 
the Government Railways employed 
a number of psychologists for voca- 
tional guidance and the promotion 
of efficiency of labor. There have 
been two journal publications in this 
field: The Science of Labor and Man- 
agement. During the war human re- 
sources were in short supply, and 
many books on the management of 
labor appeared. .\ considerable num- 
ber of psychologists entered the 
Army and the Navy and took part 
in researches on aptitude testing and 
personnel problems. 


Social Psychology 
Wundt’s folk psychology and Mc- 


Dougall’s social psychology were 
studied at an early date by Japanese 
psychologists, but research in the 
social area did not develop for a long 
time. The measurement of attitude 
was first undertaken by Koga (56) 
about 1934, and problems of race 
differences (133) and group psychol- 
ogy (46, 47, 97) interested psychol- 
ogists from about 1931 to World 
War II. During this period, prob- 
lems of national morale were seriously 
taken up by only a few psychologists, 
due, probably, to the fact that in a 
Japan dominated by the military 
caste, such questions were considered 
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to be political or philosophical in 
nature rather than psychological. 


General Observations 


During the period 1926 to World 
War II, Japanese psychologists were 
self-conscious and self-critical. Their 
science was young, and they were 
passing through a period of self-ex- 
amination and, to some degree, of 
confusion. Matsumoto, president of 
the Japanese Psychological Associa- 
tion in 1933, discussed (91) what he 
considered to be the three weak 
points of Japanese psychology: (a) 
lack of communication among psy- 
chologists, resulting in a decreased ef- 
fectiveness of research; (b) excessive 
of group methods involving 
questionnaires, tests, etc., and the 
paucity of analytic experimental re- 
searches; (c) excessive speed in adopt- 
ing new, and probably transient, 
western psychological concepts, with- 


use 


out, in all cases, applying appropriate 


historical criticisms and evaluation 
of facts. 

The years near 1935 were the high 
watermark of Japanese psychology 
before World War I]. The number of 
psychological journals was at a peak 
and psychologists were very active. 
The Tohoku Psychologica Folia (Sen- 
dai), the Japanese Journal of Experi- 
mental Psychology (Kyoto), and the 
Acta Psychologica Keijo (Seoul) be- 
came important scientific journals. 
The Japanese Journal of Educational 
Psychology (Tokyo), the Japanese 
Journal of Applied Psychology (Hiro- 
shima), and Animal Psyche (Tokyo), 
featuring articles in a somewhat 
popularized style, signaled the emer- 
gence and development of new and 
important areas. Two journals in 
fields related to psychology, Studies 
in the Science of Labor and Beitrdge 
zur Psychoanalyse (Sendai) became 
influential. According to a survey by 
Nakamura and Nagasawa, about 100 
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Japanese works were abstracted in 
the Psychological Abstracts between 
1933 and 1940. 

The number of reports given at the 
fourth meeting of the Japanese Psy- 
chological Association at Sendai in 
1933 was 84; at the fifth meeting at 
Tokyo in 1935, 126; and at the sixth 
meeting at Keijo, Korea, in 1937, 63. 
At these meetings, strictly experi- 
mental work constituted about 40 
per cent of the papers. Sensation and 
perception were the major interests, 
about one in four papers being in 
these areas. 

Psychological research in Japan 
was severely reduced at the time of 
the outbreak of the Sino-Japanese 
conflict in 1937 and its later develop- 
ment into the Pacific war in 1941. 
Several journals could not be con- 
tinued, and at the end of the war even 
the Japanese Journal of Psychology 
could not be published regularly. 
During the long period of strife a 
number of psychologists worked in 
the Navy and the Army, but, in 
general, it may be said that psycho- 
logical research almost com- 
pletely interrupted. The number of 
graduate students in psychology 
dwindled to essentially zero. 


was 


From the IE-nd of World War IT to the 
Present 


After World War II a great num- 
her of changes took place in Japanese 
life, many of them attributable to 
the Occupation. Among other things, 
the changes include a new and liberal- 
ized Constitution and a proposed 
renovation of the Japanese educa- 
tional system. We cannot consider 
here all the factors behind the influx 
of students into universities and the 
influence exerted by the “new” edu- 
cation. It is sufficient to say that the 
number of graduate students in psy- 
chology increased considerably above 
the prewar level, and there was 
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heightened interest in psychology, 
particularly in areas that had not 
been well developed previously. At 
the same time, the Japanese Psycho- 
logical Association greatly increased 
its membership, and its meetings 
were held annually. During the 1951, 
1952, and 1953 meetings, the number 
of reports presented exceeded 300. 
The percentage of the reports that 
could be termed experimental was 
about 30. 

A further discussion of trends may 
here be aided by a subject-by-subject 
consideration of work in the various 
areas. 


General Psychology 


Yatabe wrote his Introduction to 
Psychology (213) from the standpoint 
of general behavioristics (1950), and 
a Ilandbook of Psychology in 12 vol- 
umes is now being prepared under 
the sponsorship of the Japanese So- 
ciety of Applied Psychology. A latent 
behavioristic influence has now be- 
come fairly well established. In 
1941 Imada translated Bridgman’s 
Logic of Modern Physics (33) into 
Japanese, and a criticism of opera- 
tionism was expressed in the same 


year by Yatabe (209). 


Experimental Psychology 


Handbooks of experimental psy- 
chology have been edited by Takagi 
and Kido, of which Volume I (gen- 


eral methods of experimentation), 
Volume II (vision), and Volume III 
(audition and other sensory experi- 
ence) have thus far been published 
(180). As to methodology, it is 
worth noting that interests in sto- 
chastics and factor analysis have in- 
creased among Japanese psycholo- 
gists since the war, and some re- 
searches are being conducted in these 
areas (Koga [57], Indo [34], Iwahara 
[39] and others). 


Perception. A book on the psy- 
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chology of perception was published 
by Ogawa, Tanaka, and Osaka (149) 
in 1952. Recently, Obonai (127) has 
surveyed western and Japanese stud- 
ies on visual perception. 

Experiments on perception are, in 
general, maintaining ties with the 
earlier gestalt influence, now some- 
what mixed with behavioristic tend- 
encies. Interest 
shown in problems of space percep- 
tion generally (for example, by 
Akishige [9], Miyakawa [100], Osaka 
[148]), but problems of size and shape 
discrimination (as related to the per- 
ceptual constancies) seem to be par- 
ticularly popular (Kume [68], Ma- 
kino [77], Misumi [95]). Quantitative 
theories of space, as, for example, 
Luneburg’s, are received with con- 
siderable appreciation. 

The topic of figural aftereffects is 
being studied with enthusiasm, and 
in the opinion of one of us (CHG) 
Japanese studies in this area are 
being done very effectively (e.g., by 
Ogasawara [130], Azuma [15], Oyama 
[150], and others [30, 123]). Kakizaki 
is studying the effect of preceding 
conditions on retinal rivalry.  Ex- 
periments on time errors have been 
done by Inomata (35), Nakajima 
(115), Ono (145), and others. Obo- 
nai’s “induction theory’’ and Yo- 
kose’s psychophysical research (207, 
216) on form perception are influen- 
tial contributions. 

Interest in psychophysiological 
problems has increased since the end 
of the war; this is due, among other 
influences, to the very original work 
of the physiologist Motokawa (107, 
108, 109) on the electrical excitability 
of the eye. 
citability are now being carried on in 
some psychological laboratories. 
They involve studies of the time 
course of various color effects, form 
distortions, contrast effects, etc. 

Many areas of perception are being 


continues to be 


Studies on electrical ex- 
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examined, but it will not do at this 
time to describe the efforts in great 
detail. One trend may be worth men- 
tioning: the increased communica- 
tion of psychologists with physiol- 
ogists and people in related sciences. 

Audition has not become as ex- 
tensive a research interest as visual 
perception. Nevertheless, it is inter- 
esting to observe that Wada (201) 
has recently (1951) published a book 
with the same title as Yuki’s earlier 
volume, A Psychology of Tone. A 
recent publication concerns an analy- 
sis of Japanese vowels (110). 

Other sensory systems have not 
been studied extensively since the 
war. 

Animal and human learning ; think- 
ing. The greatly increased interest in 
animal experimentation is a major 
postwar development. Many experi- 
ments have been done recently at 
the universities of Tokyo, Kyoto, 
Tohoku, Hokkaido, and Osaka. The 
central problem has been the con- 
troversy between field theory and 
reinforcement theory. Studies have 
been made on the problems of sub- 
goal responses (Umeoka [194] and 
Yagi [204]), latent learning (Nozawa 
[124], Asami [38]), and the effects of 
amount of reinforcement and amount 
of drive. Studies of reasoning (Sue- 
naga [169]), place learning, and ex- 
perimental neurosis (Murata [114]) 
are in progress at this time. It is 
also worth noting that a Society for 
the Study of Behavior Theory has 
been established by psychologists at 
the universities of Kyoto, Osaka, and 
Osaka City. 

Considerable interest also exists in 
problems of human learning. Recent 
studies have been concerned with 
retroactive and proactive inhibition 
(Sagara, Ishiwara [37], Umemoto 
[193]), and researches are in progress 
on questions of temporal factors in 
interpolated materials and of gener- 
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alization effects between original lists 
and interpolated lists. Human condi- 
tioning experiments (61, 62) have 
been almost wholly restricted to the 
laboratory of Kwansei Gakuin Uni- 
versity under Kotake. Studies of 
transposition responses in children 
have been performed, and the develop- 
mental aspects of this type of be- 
havior are now being examined by 
Sato, Okano (144), Motoyoshi (112), 
and others. 

Little work is being performed on 
motor skills. 

Recently a book on the psychology 
of learning was published by Yagi, 
Umeoka, and Maeda (205). 

In the period 1946 to 1949, Yatabe 
published A Jlistory of Thinking 
(211) and three volumes of his Psy- 
chology of Thinking (212). Volume I 
of the latter work is on concept and 
meaning, Volume II on relations and 
reasoning, and Volume III on the 


thinking of animals. ‘These books by 
Yatabe, together with his History of 


the Psychology of Will (210), are 
authoritative and comprehensive. 
They are not as well known outside of 
Japan as they should be. 

Personality. Research on personal- 
itv per se has not expanded after the 
war to the same degree as some other 
areas of interest. In this connection, 
however, it is important to observe 
that many investigators are at work 
on experiments that are closely re- 
lated to this area, for example, those 
that deal with the influence of mo- 
tives on discrimination (92, 98, 162). 
In fact, most Japanese experimental 
psychologists show considerable in- 
terest in this topic and experiments 
are being done that bridge the bound- 
ary between studies of personality 
and perception (45). Some psycholo- 
gists (e.g., Kitamura and Imada) 
have recently considered the problem 
of the ego. 

Recently (1951) one of us (Sato) 
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published A Psychology of Personal- 
ity (161) which discusses the train- 
ing procedures of Zen Buddhism 
against a background of western 
studies. 


Developmental Psychology, Education- 
al Psychology, and Measurement 


The reform of the Japanese educa- 
tional system, instituted by the Oc- 
cupation, involved an important pro- 
gram of teacher education and re- 
education. The re-education program 
has placed great stress on the data of 
educational psychology, and, under 
the new system, many school teach- 
ers have been instructed in this area. 
If widespread knowledge of an area 
is an important contributor to its 
vitality, then educational psychology 
(81, 212) is very much alive in Japan. 
American psychologists who went to 
Japan during the Occupation were 
usually educational 
and it may be expected that the re- 
sults of their efforts will show tan- 
gible returns in the way of intensified 
activity in this area. 

Recently several books (22, 212) 
on the developmental aspects of edu- 
cational psychology have been trans- 
lated into Japanese; several books 
have been written on child psychol- 
ogy with an essentially educational 
emphasis, for example, one by Yama- 
shita (206); several books on the psy- 
chology of adolescence (Katsura [49], 
Ushijima [198]) have also appeared. 
Little attention has as yet been paid 
to problems of maturity and old age. 

Experimental mental 
development are being carried on, 
notably by Sonohara (167) and 
Nakano (118). Takemasa published 
two volumes on Developmental Psy- 
chology (181) in 1948-1950. The 
large public interest in educational 
and developmental psychology is re- 
flected in the fact that journals in 


svchologists, 
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this area are edited for the general 
public as well as the scientific. Em- 
phasis is placed on the problem of 
learning and its relation to vocational 
and educational guidance. 

Work continues in the general 
areas of intelligence and personality 
tests in several institutions, most 
notably in the National Research 
Institute of Education. ‘Tanaka, for 
years one of the dominant figures in 
educational measurement, still re- 
mains a leader. 

An emphasis on mental hygiene 
characterizes postwar educational 
psychology, and some psychiatrists 
and psychologists are now working 
together. In Japan, the term ‘‘men- 
tal hygiene”’ is used in its broadest 
sense to apply to normal as well as 
abnormal individuals. In _ student 
counseling work, many _ problems 
peculiar to Japan arise in a way that 
may have no parallel in the United 
States. In particular, political prob- 
lems are said to provide an “appar- 
ent" focus of maladjustment in stu- 
dents. In any case the student coun- 
selor has an important role to play 
in modern Japan. 


Clinical Psychology 


The status of clinical psychology in 
Japan has changed greatly in the 
postwar period. The name “clinical 
psychology” is now in common use 
and the Society of Clinical Psychol- 
ogy has been formed. A journal of 
clinical psychology (Clinical Psy- 
chology and Educational Counseling) 
has recently been established, and 
now, for the first time, psychoan- 
alytic theories are becoming common 
subjects of discussion. Projective 
tests, such as the TAT and Ror- 
schach, play important roles in diag- 
nosis, and nondirective methods of 
therapy are being studied (55, 106, 
168). In all of this, the influence of 
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American clinical psychology is felt 
strongly, but in addition, some truly 
Japanese methods, such as Morita's 
(104), which are based upon a syn- 
thesis of Zen doctrines and western 
studies, are regarded favorably. 


Without doubt, clinical psychology is 
developing rapidly as an important 
field of endeavor in Japan. 


Industrial Psychology 


Although industrial psychology in 
Japan has not developed to the same 
degree as in America, it has never- 
theless been shown that a basic core 
of subject matter, technique, and 
practice did exist prior to World War 
II. During World War II problems 
of training and efficiency became 
serious, and a number of psychol- 
ogists worked in the Army and Navy, 
especially in the selection of aviators. 
Since the war, labor problems (train- 
ing and worker morale, especially) 
have received more attention than 
previously. Several prefectures have 
established research institutions for 
work in this area, and a few universi- 
ties have set up chairs of industrial 
psychology and vocational guidance. 
Work in selection shows little prog- 
ress, for in a land with 85 million 
people and disproportionately — in- 
adequate natural resources, such 
procedures seem indeed to be fanci- 
ful. 


Social Psychology 


After the war many social prob- 
lems forced themselves upon the at- 
tention of psychologists. At the same 
time, the theories and experiments 
of social psychologists in America 
flowed into Japan and stimulated the 
Japanese workers. In consequence, 
work in social psychology is now be- 
ing carried on at an accelerated pace. 
At the annual meeting of the Jap- 
anese Psychological Association in 
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1952, about 40 studies were reported 
by social psychologists and, if re- 
lated researches are added to this 
total, the number becomes consider- 
ably greater. 

In the seven years since the war, 
the topics of central interest have 
changed. At first, considerable ef- 
fort was spent on the analysis of 
social and;cultural phenomena (3, 
32) that occurred during and after 
the war. Later, attention was paid 
to problems of group dynamics in con- 
nection with the democratization of 
the Japanese people. More recently, 
problems of social perception, com- 
munication (28), and the measure- 
ment of attitude became a central 
problem not only for social psychol- 
ogists but for many others as well. 
Most recently, psychologists have 
become interested in  systematiza- 
tion (Ikeuchi [31], Suenaga [170], 
and others). Problems have been 
restated, and the possibility of treat- 
ing social behavior mathematically 
has been examined. Several re- 
search groups have been formed, 
for example, the Society for the 
Research in Behavior Mechanisms 
(at Tokyo), the Association for 
Group Dynamics (at Kyushu), and 
the Society for Youth Work (at 
Tokyo). As yet, no special journals 
have been established in social psy- 
chology, but a number of systematic 
treatises have appeared lately, no- 
tably those by Minami (94) and Shi- 
mizu (163). 


OVERVIEW 


Our survey points to the fact that 
psychology in Japan, like psychology 
in most countries, did not develop 
rapidly during the first 40 years of 
its existence. Circumstances im- 
proved after World War I, and after 
1926 psychology advanced in a surer 
fashion. By 1935 it gave promise, not 
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only of becoming stronger but also of 
becoming more diversified. Unfor- 
tunately, its favorable growth was 
severely inhibited by the events that 
led to the Sino-Japanese conflict and 
World War Il. In effect, the war 
period became an interval of stagna- 
tion. Following the war, Japanese 
psychology underwent the readjust- 
ment common to all areas of Jap- 
anese life. At the present time, it 
shows signs of development to a new 
and stronger level. 


SoME OBSERVATIONS ON CURRENT 
ASPECTS OF JAPANESE PSYCHOLOGY 


It iscertain that the picture of pres- 
ent-day psychology in Japan differs 
in some details from the one that 
exists in America. The following dis- 
cussion is aimed at acquainting the 
American reader with some impor- 
tant facets of psychological endeavor 
in Japan. It is not intended that the 
discussion will deal with ‘‘problems.” 


Rather, it is hoped that the descrip- 
tion of selected areas of difference 


between Japan and America will 
acquaint the western reader with 
some facts that are essential to an 
understanding of psychology in a dif- 
ferent culture. 


Training 


Under the Occupation (1945-1952) 
a strong effort was made to remodel 
the educational system of Japan.? 
All aspects of education felt the im- 
pact of the attempt, and for a while 
the training of psychologists came 
under scrutiny. For purposes of this 
discussion we shall speak of the pre- 
Occupation system as the “old” sys- 
tem and the program suggested by 
Occupation authorities as the ‘“‘new”’ 
system. 


?We are indebted to Dr. D. D. Smith, 
Office of Naval Research, Washington, D. C., 
for information on certain educational policies 
of the Occupation. 
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The old system had a number of 
the characteristics of the British 
system. Education leading to grad- 
uate work was based upon the 6-5-3-3 
system: 6 years in primary school, 
5 years in middle school, 3 in pre- 
paratory school, and 3 in the uni- 
versity. On leaving junior college 
(cf. p. 444) the prospective university 
student had to take entrance exam- 
inations which covered many fields; 
each university provided its own 
examination. If a student passed the 
written examination, he was called in 
for an oral examination and an inter- 
view. It is important to notice that 
university study began in the 14th 
school year as in England, not in the 
12th school year as in America. 

During his university days, a stu- 
dent might concentrate in psychology. 
The courses required in the psychol- 
ogy program varied somewhat from 
university to university, but, in gen- 
eral, they seem to have covered the 
conventional range of subject matter 
despite such differences as may have 
existed among, for example, Kyoto, 
Tokyo, and Keio. Kyoto had (and 
has) a requirement in philosophy 
that does not exist at Tokyo, and 
Keio emphasizes work in natural sci- 
ences. 

Occupation authorities made an 
attempt, under the new system, to 
change the method of selecting uni- 
versity students for national univer- 
sities. (This change constituted a 
small segment of a program to recon- 
stitute the educational system into a 
6-3-3-4 sequence, analogous to the 
American scheme: 6 years of primary 
school, 3 of lower secondary, 3 of up- 
per school, and 4 of university.) It 
was planned under the new system 
that university applicants should 
take the same examinations at ap- 
proximately the same time through- 
out the nation. After a student had 
‘“‘passed’’ the national aptitude test, 
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it was planned that he be invited to 
the university of his choice for spe- 
cial examinations prepared by that 
university. If the student passed the 
examinations, he was to be inter- 
viewed. If he did not pass, he was to 
be permitted to have his files sent 
to one more national university. Pri- 
vate universities planned to operate 
differently, but the projected pattern 
was similar in many respects to the 
one here outlined. It turns out now 
that ways of utilizing the national 
aptitude test differ among different 
universities. Some universities use 
these tests as screening devices, and 
some use them only to provide addi- 
tional advisory material. 


Under either the old or new system, 
a student who graduated from a uni- 
versity could enter upon graduate 
work. Under the old system, no spe- 
cial courses were required during the 
period of graduate training. Once the 


undergraduate courses were com- 
pleted the student could apply for 
admission to the dean of the depart- 
ment and to the Faculty of Litera- 
ture (under which the psychology de- 
partment usually exists). Usually, 
as at Tokyo, entrance examinations 
were required only for people who 
came from other universities. Under 
these circumstances it turned out 
that the graduate students in a 
given university had usually been 
undergraduates in the same _ uni- 
versity. Very often the student 
stayed on as an assistant to the pro- 
fessor for many years.® 

The new program did not say much 
about selection procedures, but it 
did make the proposal that 30 credits 
of course work be required in the first 
year of graduate study. These cred- 


* Most large Japanese universities are, in 
fact, not freely open to graduates of other uni- 
versities. Most scholars remain in the same 
school from their undergraduate days until 
they achieve professorial rank. 
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its were to be accumulated in the 
areas of experimental, clinical, social, 
developmental, educational, and cer- 
tain optional areas. So far, the new 
program has not received as much 
support as was expected. In 1953 
about one-third of entering graduate 
students embarked on the new course. 

The main argument against the 
new program is that it is expensive 
both for the university and for the 
student. Under the old plan, gradu- 
ate students often received salaries 
as assistants, and the number of as- 
sistants required in a university par- 
tially determined the number of 
graduate students. Under the new 
plan, a student who has to devote his 
first year to study is not likely to re- 
ceive an assistant’s fee.‘ 

Whatever may be its fate at other 
strata of the educational structure, 
the new system has a “‘hard row to 
hoe” at the level of graduate work. 
For this reason, it may here be more 
realistic to restrict the present dis- 
cussion of graduate training to the 
old system, unless otherwise specified. 

In general, few examinations are 
given during a man’s graduate work, 
and the requirements for reports on 
research vary from university to 
university. At Tokyo and Kyoto a 
research report must be made once a 
year, but at some other universities 
this requirement does not exist, the 
final thesis being the criterion for 
completion of the work. As had been 
said, the major professor decides 

* Where the new program does exist it has 
often been confusingly intermixed with the 
old program. The result is that few students 
take the graduate courses required by the 
new program to arrive at a possibly somewhat 
higher level than that specified by the M.A. 
degree in America. In the universities of 
Tokyo and Kyoto, for example, the number 
of students entering the graduate courses is 
only four a year. This number, of course, is 
small in comparison with the number of stu- 
dents who receive the master’s degree every 
year in America, 
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when a man will finally stop his 


training. A doctorate degree is 
granted, if it is granted at all, usu- 
ally after many years of professional 
work. 
the thesis. 


It is based upon a defense of 


What opinions do Japanese psy- 
chologists have about ways of im- 
proving graduate training? The fol- 
lowing proposals were frequently en- 
countered and seem to represent the 
prevailing views of the Seminar partic- 
ipants: (a) establish a program lead- 
ing to a terminal doctor's degree ob- 
tainable after a reasonable period of 
graduate study;' (b) differentiate de- 
partmental offerings into social, ap- 
plied, and experimental psychology; 
(c) appoint more professors in more 
diverse areas of psychology ;* (d) im- 
prove the general level of undergrad- 
uate instruction; (e) specify formal 
prerequisites for graduate training in 
each area of psychology; (f) provide 
more adequately supervised work in 
the laboratory and in the clinic; (g) 
improve facilities for research and 
graduate training; (hk) place psychol- 
ogy in some other faculty than the 
Faculty of Literature. 

Some of the proposals may be car- 
ried out. It is possible, for example, 
that may vacate its 


psychology 


5 In Japan the M.D. degree is given after 
several years of research. ‘“‘Bungakuhakase”’ 
(literally, Doctor of Literature, correspond- 
ing to the western Ph.D.) is not usually given 
until a psychologist approaches his fiftieth 
birthday. 

* Two professorships in psychology exist in 
each of the two most recently founded univer- 
sities, Hokkaido and Osaka. Tokyo Uni- 
versity has only recently obtained a second 
professorship. The universities of Kyoto, 
Tohoku, and Kyushu have only one chair. 
This situation is to be contrasted with the 
one that exists in philosophy. At Kyoto, for 
instance, there are seven chairs of philosophy. 
This ratio of philosophy professors to psychol- 
ogy professors has remained unchanged since 
1906. 
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place in the Faculty of Literature. 
Other members of the Faculty even 
now accept the fact that psychology 
is misplaced. Certain government 
officials seem to favor the reclassifica- 
tion of psychology, but others do 
not, probably on the grounds that, as 
a recognized laboratory and clinical 
discipline, it will become more ex- 
pensive. 

While external agents are helping 
determine the position of psychology, 
psychologists themselves can do a 
great deal in the way of advancing 
training procedures. They can, for 
example, agree on well-chosen policies 
concerning the effective use of pres- 
ent facilities. (Facilities are, in fact, 
adequate for many purposes, and 
much excellent work can be produced 
with them. Ina few universities [e.g., 
Kwansei Gakuin] the equipment used 
in prevailing research is good.)? 
Above all, establishing policies aimed 
at producing broadly trained psy- 
chologists, who are skillful in the use 
of research techniques, will constitute 
an important step.® 


Professional Opportunities 


The lot of a psychologist in Japan 
has many desirable features, but, 
financially, it is no more (but prob- 
ably no less) attractive (in view of 
prevailing economic conditions) than 
that of many academic people in 
other lands. A full professor may 
earn $70 to $100 per month; an assist- 
ant professor, $50 to $90; an instruc- 


7 Laboratory equipment was, in general’ 
not obtainable during the war. In conse- 
quence much of the equipment, even in the 
relatively new laboratories of Tohoku and 
Kyushu universities, is old. The provision of 
funds for laboratory equipment is not yet 
adequate. 

* See, for general background information, 
an article by M. Imada: Recent psychological 
thinking in Japan. Ann. Proc. Dept. Psychol. 
Kwansei Gakuin Univer., 1954, 1, 1-9. 
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tor, $20 to $85; and an assistant about 
$20. (The salaries quoted include 
appropriate allowance for housing, 
etc.) The few jobs available in edu- 
cation, government, and industry 
probably provide slightly higher max- 
imum salaries than the academic 
positions. 

A main flaw in the professional 
picture lies in the fact that profes- 
sional psychology is relatively un- 
developed in Japan. Psychology has 
not been exploited by government, 
education, and industry, as has been 
the case in the United States, and 
few exist for psycholo- 
gists outside of universities. Until, 
through the years, a demand is felt 
for psychological work in other places 
than psychology departments, it is 
unlikely that professional opportuni- 
ties will expand greatly. It now re- 


positions 


mains for psychologists to exploit 
circumstances in which the fruitful- 
ness of psychological methods may 


be shown to advantage. 


Problems of Communication 
Psychologists in Japan feel iso- 
lated, and, in fact, they have been 
isolated, not only from western psy- 
chologists but, to a greater degree 
than they desire, from each other. 
The from western psy- 
chologists is attributable to a great 
many factors, among which geo- 
graphic position plays a minor role. 
The lack of contact with the western 
world that existed from 1937 to 1945 
had its undesirable effects, and the 
Occupation years (1945-1952) that 
followed constituted a period of read- 
justment that did not provide the 
best circumstances for scientific give- 
and-take. (During the Occupation a 
number of American educational and 
military psychologists worked with 
Occupation officials, and, in fact, did 
establish good contacts with Japa- 


isolation 
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nese workers. However, these pleasant 
relations remained indirect in most 
cases, because the Americans usually 
worked in administrative positions. 
It is to be hoped that more oppor- 
tunities for close personal relations 
in the seminar and laboratory, such 
as existed during the Kyoto Seminar 
in Experimental Psychology, will be 
established soon. It is encouraging to 
observe that the appointment of 
George M. Haslerud as Fulbright 
Professor at Kvoto and the visit of 
Koiti Motokawa to the United States 
during 1953-1954 signify steps in this 
direction.) 

The barrier of language has, as 
much as any other factor, served to 
confirm the isolation of the Japanese. 
Despite the fact that Japanese train- 
ing in foreign languages is good, the 
training of western workers in Jap- 
anese is bad. It has been the lot of 
Japanese their 
beginnings to do research which re- 
mains little known to the outside 
world. In consequence, Japanese psy- 
chologists have had to depend on the 
approval of their fellow countrymen. 
This situation may certainly have 
been attended by undesirable effects, 
which, fortunately, now seem to be 
disappearing. 

Certain devices for increasing com- 
munication might be tentatively 
recommended without commitment 
as to their ultimate value. For ex- 
ample, Japanese psychologists could 
be encouraged to publish in other 
their own. (It is 
worth observing in this connection 
that, even now, Japanese psychol- 
ogists, feeling a need to 
more effectively with their western 
colleagues, usually add _ extensive 
English abstracts to their research 
reports.) Other devices such as inter- 
cultural seminars, exchange scholar- 
ships, exchange professorships, inter- 


psychologists since 


languages than 


converse 
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cultural institutions, etc. might pro- 
duce more intimate scientific rela- 
tions. Certainly such programs 
should be encouraged. 

During the war and for some time 
thereafter, Japanese psychologists 
had little contact with each other. 
The Japanese Psychological Associa- 
tion may well ask itself what meth- 
ods may be used to increase com- 
munication among its members. The 
planning of symposia on topics of 
general interest might be suggested 
as an area in which the Association 
can make effective contributions, and 
there are indications that action will 
be taken along this line. Finally, it 
is worth observing that devices that 
improve intercultural relations can 
also improve contacts among the 
Japanese. One of the chief merits of 
the Kyoto Seminar lay in the fact 
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that it attracted psychologists from 
widely separated areas of Japan and 
provided an appropriate focus for dis- 
cussion and mutual acquaintance. 


SUMMARY 


An account of Japanese psychology 
is presented, first from a historical 
point of view, and second from the 
point of view of analyzing some of its 
modern aspects. 

The historical sequence is broken 
up into three parts: from the begin- 
nings of psychology (about 1880) un- 
til 1926, from 1926 through World 
War II, and from the end of World 
War II to the present. Some current 
aspects manifested by Japanese psy- 
chology are considered. The discus- 
sion of these matters centers about 
the topics of training, professional 
opportunities, and communication. 
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The purpose of this review is to 
describe the leaderless group discus- 
sion (LGD): its history, applicability, 
method of administration, and relia- 
bility and validity as a technique for 
assessing leadership potential. We 
shall discuss the major variants in 
procedure that have been tried, and 
indicate the effects of these variations 
on LGD reliability and validity. We 
shall not consider the initially leader- 
less discussion’s role as a basic re- 
search tool for studying the develop- 
ment of leadership as it has been 
employed by investigators such as 
Carter (29, 30, 31, 32), Pepinsky, 
Siegel, and Vanatta (61), and Bell 


and French (24), although we will 
make use of their findings wherever 
they shed light on the LGD as an 
assessment device. 


HISTORY OF THE LEADERLESS 
Group DISCUSSION 


According to Ansbacher (2), the 
originator of the method was J. B. 
Rieffert, who directed German mili- 
tary psychology from 1920 to 1931. 
The technique, called by first users 
the Schlusskolloquium or Rundge- 
sprich, was aimed at showing be- 
havior “toward equal partners.”” The 
German Army used the procedure 
until about 1939, while the Navy 
continued employing it in their selec- 
tion programs until late in World 


1 Several parts of this article were sum- 
marized for presentation at a Symposium on 
Situational Performance Tests, Sixty-first 
Annual Meeting of the American Psychologi- 
cal Association, Cleveland, Ohio, September 
7, 1953. 


War II. Various German civilian 
agencies now appear to be employing 
the LGD as an assessment tool. 

Influenced by the German de- 
velopments in situational testing, by 
1942 the British War Selection Board 
introduced such tests into their bat- 
tery for selecting Army officer candi- 
dates. The basic series of leaderless 
group tests was evolved by Bion and 
included the LGD (41, 45, 70). <A 
similar program was established by 
the British Navy (70). 

At the end of the war, the LGD 
was employed by Fraser (39, 40) as a 
device for screening British manage- 
ment trainees, and by Vernon (68, 
69) for testing British Civil Service 
applicants. 

Similar developments took place in 
Australia (43, 65), South Africa (3), 
and Norway.? 

The OSS Assessment Staff (59) ap- 
pears to have initiated use of the 
LGD in the United States late in 
World War II. American federal and 
state civil service examiners began 
trying out the technique at the end of 
the war (5, 26, 27, 53). 

Approximately 25 per cent of the 
190 Civil Service agencies surveyed 
by Fields (37) reported using the 
LGD. A Federal Civil Service man- 
ual appeared in 1952 (56). The LGD 
has also been employed by several 
American industrial and_ business 
firms. Its rapid acceptance has led 
both Meyer (58) and Douglas (35) to 
caution about overestimating its 
validity and utility. 

? Private correspondence from V. C. Jahl, 
Chief Psychologist, Norwegian Armed Forces. 
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We attribute the use of the LGD 
primarily to its relative ease of ad- 
ministration compared to individual 
interviews, where large 
numbers of applicants are involved, 
as well as to its face validity com- 
pared to paper-and-pencil tests. 

It may be that the face validity of 
the LGD really is what Gulliksen 
(44) has labeled intrinsic validity. A 
number of extrinsic validity studies 
are now available which may justify 
ex post facto the possibly widespread 
premature acceptance of the LGD as 
a valid measure of the tendency to 
display successful leadership. 

Several investigators have assumed 
that the LGD is intrinsically valid 
as a means of appraising successful 
leadership behavior. Pepinsky, 
Siegel, and Vanatta (61) have used 
LGD performance as a criterion for 
evaluating the effects of counseling; 
Bell and French (24) and Carter and 
his associates (29, 30, 31, 32) have 
assumed that they are studying lead- 
er behavior when observing discus- 
sions. 

Another reason for the continuing 
interest in employing situational tests 
to forecast 


especially 


or appraise leadership 
behavior may be due to the nature of 
leadership and psychometrics. 

In reaction to the earlier emphasis 
on the effects of individual differences 
on leadership behavior, there has 
been, more recently, an emphasis on 
situational effects on leader behavior. 


That both 
important 


sources of variance are 


for leadership behavior 
can be accounted for fully only after 
analyzing the main effects and the 
interaction effects of situational dif- 


ferences and individual differences 
in motivation, behavioral history, 
and biological level (maturity, hered- 
ity, and integrity of the CNS)? 


* This formulation was originally proposed 


by W. P. Hurder. 
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It follows that any test designed to 
forecast leadership potential and in- 
tended to have some generality of 
application would need to: (a) share 
elements in common with a variety 
of situations; (6) vary directly with 
the situation for which predictions are 
to be made. 

Most psychometric test procedures 
have been developed to meet only the 
first requirement, for many behaviors 
are relatively little influenced by 
situational changes. Thus, one’s 
speed-of-arm movement or spatial 
visualization accuracy remains fairly 
constant over a wide range of situa- 
tions. On the other hand, since the 
effects on leadership behavior of situ- 
ational change are large, psycho- 
metric methods like the LGD, and 
other situational tests which meet 
both requirements, would appear to 
offer more promise initially as psy- 
chometric techniques for forecasting 
leadership behavior. 

With reference to the first require- 
ment, the LGD, in common with a 
range of other situations in which 
leader behavior is to be appraised or 
predicted, appears to share such ele- 
ments as the need for would-be 
leaders: to communicate effectively, 
to overcome inertia, to solve various 
interaction problems, to meet dead- 
lines, and to reach consensus. 

With reference to the second re- 
quirement, the LGD and other situ- 
ational tests, by their very construc- 
tion and administration, tend to vary 
consistently with the nature of the 
examinees and the real-life situations 
for which personnel are being chosen. 
Candidates for positions of leader- 
ship are assessed among administra- 
tors by observing them solving ab- 
stracts of administrative problems 
among administrative trainees. Tests 
of the same individuals for positions 
as Army would 


officers involve 
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studying them solving military prob- 
lems during interaction with officer 
candidates. To some extent, there- 
fore, situational tests may be able 
to take Variations 
account. 


situational into 


APPLICABILITY OF THE LEADER- 
LESS Group DISCUSSION 


According to published reports, 
the LGD has been used to assess can- 
didates for many professions and 
occupations. The examinees have 
included: officer candidates (2, 41, 
43, 45, 70, 71); OSS agent applicants 
(59); advanced Naval (31, 32), Air 
Force, and Army ROTC cadets (15); 
industrial management trainees (39, 
40, 65); industrial executives and su- 
pervisors (21, 22); shipyard foremen 
(53); labor mediators 
(42); civil service supervisors and 
administrators (3, 54, 68, 69); ap- 
plicants for foreign service (68, 69); 


supervisory 


graduate engineer trainee applicants 
(9, 10); sales trainee applicants (9, 
10); public health physicians (5, 26, 
27); teachers (47); visiting teachers 


(51); and social service workers 


(52). 


DESCRIPTION OF TRI LEADERLESS 
Group DISCUSSION 

The basic scheme of the LGD is 
to ask a group of examinees, as a 
group, to carry on a discussion for a 
given period of time. No one is ap- 
pointed leader. The examiners do not 
parti ipate in the but 
remain free to observe and rate the 
performance of each examinee. To 
date, there has been little standard- 
ization by all examiners of the num- 
ber of per group, the 
length of testing time, type of prob- 
lems, if any, presented to the candi- 
dates, and the directions given to 
them. Also, the number of raters has 


discussion, 


discussants 


varied, as has the seating arrange- 
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ment. Examiners have differed in 
the kinds of behavior they have ob- 
served and rated and in the extent 
to which their ratings have been at- 
tempts to describe the behavior they 
have observed, rather then attempts 
to make inferences about the per- 
sonality of the candidates. 


Factors Rated by Observers 


Unless otherwise indicated, the 
LGD results we shall describe are 
either based on observers’ ratings of 
the amount of successful leadership 
displayed‘ in the discussion, or are 
inferences about the personalities of 
the candidates based on observations 
of this behavior. 

In regard to personality inferences, 
Couch and Carter (34), following a 
factorial analysis of observers’ infer- 
ences based on several kinds of situ- 
ational including the LGD, 
found that three independent factors 
could be isolated which accounted 
for situational test ratings. The fac- 
tors and the ratings with highest 
loadings on each factor were: (a) 
Individual Prominence (authoritari- 
anism, confidence, aggressiveness, 
leadership, striving for recognition); 
(6) Group Goal Facilitation (efficien- 
cy, cooperation, adaptability, pointed 
toward group solution); (c) Group 


tests 


‘A leadership act is said to occur when 
member A of a group behaves in a way di- 
rected another member, 
B's, behavior. More specifically, a leadership 


toward changing 
is directed to- 
ward: (a) changing the intensity and/or di- 
rection of B's and/or (b) re- 
with the 


act occurs when A’s behavior 
motivation, 
structuring B's abilities to cope 
situation and reduce B's needs (13). 

All attempted \eadership acts in which A 
reaches his goal of changing B are considered 
uccessful leadership. If B's change in be- 
havior brought about by A’s successful leader- 
ship leads to need satisfaction for B and for A 
(apart from A's satisfaction in being a success 
ful leader), A’s successful leadership act is 
considered effective (46 
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Sociability (sociability, adaptability, 
pointed toward group acceptance). 


As will be pointed out later, many 
of these specific inferences by obser- 
vers, such as those concerning au- 
thoritarian tendencies, may be valid 
only as descriptions of performance in 
leaderless groups and actually may 
be inversely related to attitudes and 
performance in real life.® 

A similar factorial analysis of OSS 
situational test data by Sakoda (62) 
uncovered three factors which re- 
semble Couch and Carter's: (a) 
Physical Energy (energy and initia- 
tive, physical ability, leadership); (d) 
Intelligence (effective intelligence, 
observing and reporting, propaganda 
skills); (c) Social Adjustment (social 
relations, emotional stability, 
rity’). Carter (29) noted that similar 
factors appeared in several studies 
when real-life leader behavior was 
rated and factor analyzed. 

In aiming to estimate successful 
leadership behavior, the descriptive 
check lists of behavior in leaderless 
discussions that evolved from vari- 
ous studies made by the author and 
associates appear to concern primar- 
ily what Hemphill (46) has labeled 
“Initiation of Structure,” a factor 
which has emerged in several of the 
Ohio State Leadership Studies (e.¢., 
38) and which has 
Couch and Carter's 
cilitation,” 

In the check list used in most stud- 
ies by the author and his co-workers 
to assess leader behavior, raters are 
asked to indicate whether each candi- 
date showed the following behaviors 
“a great deal” (4 pts.), “fairly much” 
(3 pts.), “‘to some degree’’ (2 pts.), 
“comparatively little’ (1 pt.), or 
“not at all’’ (0 pts.): (a2) Showed initi- 


““secu- 


similarities to 
“Group Goal Fa- 


® For a discussion of leadership and au- 
thoritarianism, the reader is referred to Hol- 
lander (48). 
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ative; (b) was effective in saying what 
he wanted to say; (c) clearly defined or 
outlined the problems; (d) motivated 
others to participate ; (e) influenced the 
other participants; (f) offered good 
solutions to the problem; (g) led the 
discussion. The sumof ratings received 
on all seven items is the examinee’s 
performance score. 

In an unpublished study by Pruitt 
and the author, seven items were 
rated which aimed at assessing the 
extent to which each LGD partici- 
pant showed “‘consideration for the 
welfare of his associates’’—a factor 
of leader behavior uncovered in vari- 
ous Ohio State Leadership Studies 
(38). This factor is similar to Couch 
and Carter’s Group Sociability and 
Sakoda’s Social Adjustment factors. 
The seven items included: (a) En- 
gaged in friendly jokes and com- 
ments; (6) made others feel at ease; 
(c) complimented others; (d) helped 
others; (e) encouraged others to ex- 
press their ideas and opinions; (f) had 
others share in making decisions with 
him; (g) helped settle conflicts. At the 
same time, a list of seven “Initiation” 
behaviors were observed and rated. 

The intercorrelation between 80 
LGD participants’ Initiation and 
Consideration was .78, too 
warrant continued use of 
both assessments. More significantly, 
the mean rating on any single Initia- 
tion behavior was 2.2 points, while 
the mean rating on any single Con- 
sideration item 0.75 
Thus, although raters could assess 
reliably both types of leader behavior 
(r,,=.90, .85), much less Considera- 
tion behavior appeared, and most of 
it was exhibited by Initiators. 

On the basis of this, and evidence 
to be presented later, we suspect that 
the LGD rating is more an assess- 
ment of the tendency to initiate struc- 
ture in an initially unstructured 


scores 


high to 


was points. 
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social situation—one of several types 
of successful leadership behavior— 
than an assessment of tendencies to 
be considerate. 


Conditions Affecting LGD Leadership 
Ratings and Performance 


A number of variations have been 
systematically studied which may or 
may not seriously affect ratings of 
successful leadership in the LGD. 
These include: the size of the group, 
the seating arrangement, pretest 
coaching, the motivation of the par- 
ticipants specific to the situation, and 
member participation. 

Effects of size. As the size of the 
group increases from two to twelve, 
the mean LGD rating assigned is 
reduced approximately 50 per cent. 
Eighty-three per cent of the variance 
in ratings where members come from 
groups of 2, 4, 6, 8, and 12 is ac- 
counted for by size according to a 
study of 120 examinees by the au- 
thor and Norton (19). It appears 
that the opportunity to display suc- 
cessful leadership is closely associated 
with the size of the group. We con- 
clude that proper correction must be 
made in any LGD studies where dif- 
ferent examinees have been tested in 
groups varying in size. 

Effects of location of seat and seating 
arrangement. Sixty-eight discussions 
among 467 participants were ana- 
lyzed by the author and Klubeck 
(16) to determine the effects of the 
particular seat a participant held on 
the LGD rating he obtained. For 
both V-shaped arrangements and 
those in which members sat in paral- 
lel rows facing each other, members 
seated at the ends obtain slightly 
higher mean scores. In two sets of 
the seven studies included in this 
analysis, the results were significant 
at the 1 per cent level. The effects 
tended to disappear when variations 
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in the real-life esteem of the members 
were held constant. At any rate, the 
differences, statistically significant or 
not, associated with participants’ 
location in the group were too small 
to be of much practical concern. 

Pretest coaching. If LGD perfor- 
mance can be greatly altered by 
means of brief coaching which does 
not reflect any real change in per- 
sonality, its routine use by large 
organizations such as the Armed 
Forces for screening OCS applicants 
would be impossible. Therefore, 
Klubeck and the author (50) briefly 
coached the third highest and the 
sixth highest participants (among 
seven) of each of 20 leaderless dis- 
cussions and then ran retests on each 
group of seven. An analysis of co- 
variance led to the inference that 
while those who were fairly high in 
LGD score initially profited signif- 
icantly from coaching, those who 
were initially low did not profit at all 
from the brief coaching. While the 
shift upward of the high ranking 
subjects was statistically significant, 
it was not very large in an absolute 
sense. 

The investigators cited a number 
of reasons for rejecting the inference 
that the differential improvement was 
due to differential motivation, and 
concluded that LGD behavior is a 
function of personality traits and 
needs which cannot be altered readily 
by brief coaching. 

This interpretation was in agree- 
ment with Harris’ (45) opinion that 
one could not ‘“‘cram” for the leader- 
less group discussion. Harris sug- 
that priming a candidate, 
rather than help would most likely 
handicap him by “inhibiting 
spontaneity.”’ 

In an unpublished study, Pruitt 
and the author gave the same part- 
directive, part-permissive coaching 


gested 


his 





470 


as Klubeck and the author had ad- 
ministered previously, but the groups 
receiving training were coached for 
15 minutes while assembled as groups 
prior to undergoing the LGD.  Di- 
rectly in line with Harris’ hypothesis, 
the five trained groups, each with six 
participants, showed _ significantly 
less ‘“‘initiation’’ behavior than eight 
untrained control groups of six each. 
The trained groups exhibited only 
half as much “ 
havior as the untrained groups. Rat- 
ers commented on the “freezing up” 
and the “increased nervousness and 
tensions’ which the 
trained groups; this was in line with 
Harris’ hypothesis. 

As pointed out by Klubeck and the 
author: 


consideration” be- 


characterized 


. long term training is obviously an entirely 
different matter. 
vidual underwent psychotherapy successfully 
which led 


needs and self-esteem, there is no reason to re- 


Thus, if an ineffective indi- 
to favorable modifications in his 


ject the possibility that he would exhibit im 
proved performance on the LGD, but here the 
LGD would reflect real personality change 
(50, p. 71). 

Actually, Pepinsky, Siegel, and 
Vanatta (61) carried out such long- 
term training with some success. 

Iiffects of extrinsic motivation of 
participants specific to the situation. 
Do momentary changes in the in- 
centive to participate, unrelated to 
the personality of the participants, 
make much difference in LGD per- 
formance? An unpublished study 
completed by the author tentatively 
suggests that added extrinsic motiva- 
tion is relatively unimportant in 
determining the behavior of LGD 
participants. Two small samples of a 
total of 31 were 
tested and leaderless 
groups of seven to nine. The first 
sample was told that the first test 
was mere practice and had no bearing 
on class grades, but that the second 


college students 


retested in 
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test would count as an important 
grade determiner. The second sam- 
ple took the “important’’ examina- 
tion first and was then given the sec- 
ond test as a ‘“‘check-up on the test” 
that would not affect them. The 
means of successful leadership dis- 
played in both the motivated and un- 
motivated situations were practically 
identical. Moreover, the correlation 
between LGD performance in both 
situations was .86, indicating that 
the addition of incentives affected 
performance relatively and absolute- 
ly very little. 

Needs more basic to the personal- 
ity probably energize and sustain 
LGD behavior. These needs appear 
to be either openly denied or uncon- 

When 227 
unpublished 
and 


scious to a great extent. 
ROTC cadets, in an 
study by the 


author Coates, 


were asked to indicate on a five-point 
scale how much they 
well as possible on an LGD, a correl- 


tried to do as 


ation of only .30 was found between 
reported effort and LGD 
scores obtained. 

Analogous to performance’ on 
paper-and-pencil aptitude tests, in- 
creasing the extrinsic motivation of 
mature subjects does not serve to 
materially, sub- 

perform near their 

without such added in- 
Of course, we might suc- 
ceed in lowering performance if we 


actual 


raise scores since 
usually 
maximum 


centive. 


jects 


could sufficiently discourage subjects 
or increase tension beyond some opti- 
mum point. 

It is possible that the LGD may 
be no more sensitive to variations in 
examince’s extrinsic motivation than 
most aptitude tests. It is probable 
that the LGD is less affected by such 
extrinsic motivation as the desire 
to obtain a job than is the usual un- 
disguised personality test. 

What is needed are comparative 
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studies of operational, as opposed to 
experimental, validities of the LGD. 

Amount of participation. Analyses 
generally find a high correlation be- 
tween the sheer amount of talk of 
LGD participants and the scores they 
earn for successful leadership. Time 
spent talking correlated .65 with rat- 
ings of success for 64 sales and man- 
agement trainee candidates (10), .77 
for 140 sorority girls (23), .93 for 
20 college students (6), and .96 for 
36 college students of an unpublished 
study by R. L. French.® 

This high correlation is disturbing 
at first, since it suggests that LGD 
ratings primarily discriminate the 
verbose from the However, 
the relationship can be shown by a 
series of deductions to logically fol- 
low if we assume that almost all 
participation in the LGD is attempted 
leadership behavior, that LGD rat- 
ings are successful 
leadership behavior, and that at- 
tempted leadership must occur in 
order for some of it to be judged suc- 


More de tailed 


terse. 


assessments of 


cessful. discussions 


of these relationships are presented 
elsewhere (6, 13). 
Kind of participation. 


Qualitative 
differences appear in the kinds of 
LGD participation engaged in by the 
successful leader and by those who 
participate and attempt leadership 
acts, but who nevertheless earn low 
When 
the responses during leaderless dis- 
cussions of 46 fraternity 


scores as successful leaders. 


members 
were analyzed by the Bales technique 
(4), other judges’ ratings of partici- 
pants’ leadership corre- 
lated .66 with frequency of attempting 
answers; .50 with frequency of posi- 
tive socioemotional responses; .44 


successful 


* Frencu, R.L. Verbal output and leader- 
ship status in initially leaderless discussion 
groups. Amer. Psychologist, 1950, 5, 310. (Ab- 
stract) 
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with frequency of asking questions; 
and .32 with frequency of negetive 
socioemotional responses (12). The 
roles associated with successful lead- 
ership were initiator-contributor, 
opinion-giver, elaborator, compro- 
miser, orienter, evaluator, energizer, 
encourager, etc. (11). Only one type 
of participation, attempling answers, 
was associated (r =.48) with the real- 
life esteem of the participants. 

In a study of 40 ROTC cadets, 
Carter, Haythorn, Shriver, and Lan- 
zetta (32) found that LGD leaders 
were much more likely than non- 
leaders to diagnose the situation, ask 
for expressions of opinion, propose 
courses of action for others, support 
and defend their proposals, give in- 
formation, express opinions, and ar- 
gue with others. 


RELIABILITY OF THE LEADERLESS 
Group Discussion 

The reliability of the LGD will be 
considered in two phases: rater agree- 
ment and_ test-retest reliability. 
Wherever possible we will indicate 
factors that systematically influence 
these reliability estimates. 


Rater Agreement 


Table 1 displays the average agree- 
ment found any 
servers in rating the first LGD ad- 
ministered to the designated sub- 
jects. The results suggest that the 
refined 7-item check list and _ its 
predecessors of 9 and 14 items used 
in most of the studies by the author 


between two ob- 


and associates (e.g¢., 15) vield a con- 
rater correlation of between 
82 and .84. 
two raters using the check list method 
vield a satisfactory estimated relia- 
bility of .90 or above. 

A number of factors influence the 
agreement These 
will be considered next. 


sistent 
LGD scores based on 


between raters. 
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TABLE 1 
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AGREEMENT 


BETWEEN ANy Two RATERS 


Subjects 


sales and mgt. trainee 
candidates (9) 
administrative 


candidates (3) 


trainee 


male college students 


(30) 
NROTC subjects (31) 


mixed 
(6) 


college 


fraternity members (20) 


mixed college students 


(19) 


ROTC cadets (15) 


fraternity pledges (72) 


sorority members (23) 


mixed college students* 


mixed college students* 


* Unpublished data. 


Effects of discussion effectiveness. A 
pronounced curvilinear relationship 
was found between the rated effec- 
tiveness of 67 leaderless discussions 
and the correlation between two ob- 
servers’ ratings for each discussion. 


students | 


What Rated 


Desirability for 
position 
Personality and 
ability (high- 
ly intercorre- 
lated) 
Leadership per- 
formance 
Leadership, au- 
thoritarian- 
ism, initia- 
tive, and in- 
sight 
intercorre- 
lated) 
Successful lead- 
ership dis- 
played 


Successful lead- 


ership dis- 
played 
Successful lead- 
ership dis 
played 
Successful lead- 
ership dis- 
played 
Successful lead- 
ership dis- 
played 
Successful lead- 
ership dis- 
played 
Initiation of 
structure and 
interaction 
Consideration of 
others 


| Graphic 


| Graphic 


(highly | 


Rater 
discussions of average effectiveness 
and was lowest for discussions which 
were either extremely effective or 
extremely ineffective. 
eta found for four subsamples of these 


| Estimated 
‘ ; Reliability 
( orrelation| of LGD 
between 
Any Two 
Observers 


Average 


Rating 
Method Rating 
Using Two 
Observers 
Paired com- 7 .80 
parison 
Graphi 
ing scale 


rat- 


rat- 
ing scale 
rat- 


ing scale 


“heck list 
(13-item) 


‘heck list 
(14-item) 


‘heck list 


(9-item) 


“heck list 
(9-item) 


‘heck list 


(9-item) 


“heck list 
(7-item) 


“heck list 
(8-item) 


“heck list 
(8-item 


agreement was highest for 


The average 
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67 discussions was .68 (17). It may 
be that observers become too inter- 
ested in the very effective discussions 
and too bored or detached from the 
very ineffective ones; or, the variance 
of LGD ratings may be reduced in 
discussions at either extreme of the 
distribution of effectiveness, which, 
in turn, will serve to reduce the relia- 
bility of the ratings. 

I:ffects of size. The number of 
participants in a given discussion ap- 
pears to influence the extent to which 
observers will agree with each other. 
The author and Norton (19) tested 
five samples of 24 subjects each in 
discussions of two, four, six, eight, 
and twelve in size. Maximum agree- 
ment among observers was reached 
(r=.89) and was 
groups of two (r=.72). 
Carter, Haythorn, Meirowitz, and 
Lanzetta (31) found that mean ob- 
server agreement was only .70 for 
four-man while it was .85 
when the same men were retested in 
eight-man groups. 

Effects of test or retest. The last 
results cited can also be accounted 
for, in part, by the fact that observer 
agreement appears to increase when 


in groups of six 


lowest in 


groups, 


the same subjects are retested. Thus, 
when the 120 subjects of the author 
and Norton were all 
server agreement increased for groups 
of all sizes from a mean correlation 
of .82 to a mean correlation of .90. 
Effects of the object of rating. The 
data summarized in Table 1 sug- 
gest that rater agreement is higher 
where raters employ a check list in 
which they merely indicate the extent 
to which each of a number of items 
of leader behavior was exhibited by 
each candidate, rather than where 
they employ a single graphic rating, 
or where they attempt to make in- 
ferences about the standing of the 
examinees on intervening variables 


retested, ob- 


of personality or ability supposedly 
underlying the LGD behavior being 
observed. The median estimated 
reliability of check list ratings is .90; 
the median estimated reliability of 
other types of ratings is .80. 


Test-Retest Reliability 

The test-retest 
reliability coefficients and the studies 
on which they are based are listed 
in Table 2 in order of the size of the 
coefficients. Listing them by size 
leads to the inference that test-retest 
reliability is higher the more similar 
the test and retest situations. The 
consistency of LGD behavior is high- 
er the less group membership changes 
from test to retest; the less the prob- 
lem changes; the less some members 
are increased in ability to lead; the 


seven available 


less some members are increased in 
“real-life” 
are changed; and the 


status; the less observers 
less time be- 
tween permitting 
more random or biasing change to 
among participants. 
results conform to the principle of 
consistency, proposed by the OSS 
Assessment Staff, that a subject will 
respond to similar environmental con- 
ditions in a similar manner (59). 
Where changes in situation from 
test to retest are reduced to a mini- 
mum, a high reliability 
is found. It is probable that where 
behavior check lists describing par- 
ticipant behavior are used, the true 
test-retest reliability of the LGD is 
somewhere between .75 and .90. 
Effects of size. The number of 
participants in a group appears to 
determine the consistency of the be- 
havior from test to retest. For, while 
the author and Norton (19) found 
groups from four to twelve in size to 
have an average test-retest reliabil- 
ity of above .90, groups of two had 
a corresponding reliability of only 


tests increases, 


occur These 


test-retest 
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TABLE 2 


Test-Retest RELIABILITY OF RATINGS OF 


LEADERSHIP DISPLAYED IN LEADERLESS 


Group Discussions RANKED IN ORDER OF SIMILARITY OF TEST AND RETEST 


Subjects Rating Method 


Check list 
(9-item) 
Check list 
(9-item) 


mixed college stu- 
dents (19) 

mixed college 
dents* 


stu 


Check list 


(7-item) 


sorority members 


(23) 


mixed college stu- 
dents (24) 


Participants 
rank 
other 
Checklist 
(13-item) 


each 


mixed college stu- 


dents (6) 


Check list 


(7-item 


mixed college stu- 
dents (21 


7. 172 ROTC cadets (15) Check list 


(9-item) 


8. 36 male stu- 


dent 


college Rating scale 


(30) 


* Unpublished data 


46. 
man and probably three-man leader- 
less group discussions should not be 


These results suggest that two- 


used for assessment purposes. 


VALIDITY OF THE LEADERLESS 
Group DISCUSSION 


The construct validity of the LGD 


| Test- 
Retest- 
Correla- 

tions 


Interval 
between 
Tests 


Differences between 
Test 
Situations 


| Week 
Week 


None .90 

One “important 
exam,” other 
not 

Two members of 
each group of 7 
given _ training 
between tests 


hours 


| Groups rearranged, | 
six retests 


Intervening LGD | 
among leader or 
followers 
before 
test 

Group rearranged, 
type of problem 
systematically 
varied: case his- 
tory, no problem 

and 

leader specifica- 
tion to be 
lined 
Groups 
ed; 
jects changed in 
“real-life” status 
more than others 
between 
different raters 

| Six intervening 

situational tests 
in groups of two 
before 
test 


only 


second 


58 


Iwo days 
to week 


presented, 
out- 


rearrang- 
some 


| 
sub- 


tests; 


to 


lhree 
four 
months 
second 


will be examined in this discussion. 
This requires both logical and empiri- 
cal review. 

Figure 1 diagrams the relation- 
ships among a group of variables of 
importance in the study of leader- 
ship. Using a set of postulates based 
primarily on learning theory, the 
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author elsewhere (13) has deduced 
these relationships, which may be 
summarized as follows: 


1. The more a member its able to solve the 
group's problems because of personal charac- 
teristics (such as his capacity, achievement, 
responsibility, and participation), the more 
likely he is to exhibit successful leadership be- 
havior in real life, in quasi-real situations, and 
in the leaderless group discussion. These per- 
sonal characteristics are reflected in perform- 
ance on various psychological tests of intelli- 
gence, proficiency, and personality. 

2. The more a member exhibits successful 
leadership, the higher is his esteem among his 
associates—the extent to which he is regarded 
of worth as a member or leader to the group, 
regardless of his position—and the higher will 
be the merit ratings he receives as a successful 
leader or member. The higher his esteem, the 
more likely he is to be of further success as a 
leader among his associates. 

3. The higher a member's status, as inferred 
from his rank or the worth of his position 
among his associates, the more likely he is to 
successfully lead his associates. 


Further relations between vari- 
ables noted in Fig. 1 can be ignored 
here. 

If ratings of LGD performance 
are actually valid measures of tend- 
encies of individuals to differ in 
successful behavior, and if we accept 
as a logical rule that variables with 
common determinants should cor- 
relate positively with each other, 
then following the outline of Fig. 1, 
LGD scores should correlate: 
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1. With real-life rank (aud hence real-life 
status) when the LGD is among associates of 
different ranks; 

2. With real-life merit ratings (hence real- 
life esteem); 

3. With leadership performance in other 
quasi-real situations; 

4. With observations and indices of success- 
ful leadership performance in real-life situa- 
tions: 

5. With personal characteristics as meas- 
ured by psychological tests and measurements 
commonly associated with success as a leader, 


Any positive correlation between 
an LGD rating and these other speci- 
hed measures should provide partial 
evidence of validity of the LGD per- 
formance rating—namely that it 
actually measures leadership poten- 
tential or individual differences in 
tendency to be successful as a leader. 
Previously presented evidence indi- 
cates that the measurement of LGD 
performance is consistent with itself. 
The question still to be answered is 
whether or not it is consistent with 
the various other measurements as- 
sociated with, described as, or de- 
fined successful leadership be- 
havior. Rated LGD performance 
should be associated with these other 
measures if it is an assessment of 
success as a leader. 

We will now survey empirical in- 
vestigations of the extent to which 
rated LGD performance was found 
associated with status and esteem in 
real-life, personal characteristics and 
leadership performance elsewhere. 


as 


Status (as Estimated by Rank) and 
LGD Performance 


A biserial correlation of .88 was 
found between the rank in the com- 
pany of each of 131 oil refinery super- 
visors and their success as LGD 
leaders among their associates (22). 
The more the discussion problem con- 
cerned matters for which they had 
rank over their associates, the higher 
was this correlation (21). A corre- 
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sponding correlation of .51 was found 
for 264 ROTC cadets.? The lower 
correlation in ROTC probably re- 
flected the fact that rank differences 
were less vital to the cadets than to 
the industrial executives. 

When 180 ROTC cadets were re- 
tested among their associates a year 
after an original test, those who rose 
during the year from cadet noncom 
to cadet first lieutenant or higher 
gained significantly more in LGD 
score on the retest compared to the 
test than those who received promo- 
tions to cadet second lieutenant only. 


Esteem as Estimated by Real-Life 
Merit Ratings and LGD Per- 
formance 


Table 3 lists 17 correlations be- 
tween LGGD performance and esteem- 
in-real-life as estimated by merit 
ratings. It also shows when and how 
the ratings of merit were obtained. 
The median correlation is .39 and 


is raised to .51 when only the seven 
cases in which correction was made 


for the unreliability of esteem rat- 
ings are considered. As shown in 
Table 3, LGD scores have been found 
moderately predictive of merit as 
an ROTC cadet officer (15), sorority 
or fraternity member (20, 23, 72), 
civil service administrator (3, 69), 
shipyard foreman (55), foreign serv- 
ice administrator (69), and OCS 
cadet officer (71). The moderate cor- 
relations between LGD scores and 
real-life esteem for the studies that 
involved discussions among strangers 
suggest that a common source of 
variance among examinees, which 
exists beyond the effects of situation, 
underlies an examinee’s merit among 
his real-life associates and his success 


7 Bass, B. M., & Coates, C. H. Situa- 
tional and personality factors in leadership in 
ROTC, Unpublished manuscript. 

8 See footnote 7. 
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as a leader among strangers. The 
relationship between esteem and 
L(sD score cannot be attributed sole- 
ly to the tendency of an examinee to 
display successful leadership among 
associates who esteem him. Thus, 
the LGD appears to assess attributes 
of the examinee that are not specific 
either to the test situation or to the 
group in which he is tested. 
Variables related to the correlation 
between esteem and LGD performance. 
The total amount of successful lead- 
ership displayed in a leaderless dis- 
cussion appears to reflect the average 
merit as leaders elsewhere of all the 
participants of the discussion. An 
analysis of 67 LGD’s among frater- 
nity pledges, sorority members, and 


®* Another interpretation of these findings 
has been offered by E. L. Kelly in the 1954 
Annual Review of Psychology (Stanford, Calif.: 
Annual Reviews, Inc., p. 295). Kelly suggests 
that to some extent, the real-life merit raters 
may react in the same way to the same cues 
irrelevant to leadership success, as do the 
LGD raters. Both assessments are in agree- 
ment, but the source of agreement does not 
necessarily concern leadership potential. 
Thus, in a given situation, real-life merit 
raters and LGD raters both may tend to as- 
sign high ratings to thin men, and logic and 
the literature on the subject suggest that 
thinness should have no relation to leadership 
potential. 

A counterargument is as follows: The real- 
life merit ratings, biased as they are, are a 
function of the extent to which the raters 
value or esteem the ratees. 
tends to have consequences affecting the con- 


The evaluation 


tinuing success of the performance under 
evaluation. If seemingly irrelevant cues such 
as thinness influence merit ratings in real 
life, they will then tend to be associated with 
esteem and leadership potential. In reacting 
to the same biasing cues, the LGD observers 
are in error as far as logic or psychologists are 
concerned, but, despite this, their error is asso- 
ciated with real-life leadership potential as 
well as with the biased real-life merit ratings. 

This same counterargument will not apply 
where merit ratings have no future conse- 
quences on the success of the actual perform- 
ance being rated, such as in the case of ap- 
praising diagnostic proficiency of physicians. 
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ROTC cadets found a correlation of 
.35 between the mean LGD score per 
discussion group and the mean es- 
teem-in-real-life score of the partici- 
pants of each group. Thus, there 


was a between-groups as well as a 


within-groups positive correlation be- 
tween LGD ratings and esteem as 
estimated by merit ratings. In the 
same way, the LGD scores within a 
.20 with the 
merit scores among the participants 
of that discussion (17). 

These results suggest that LGD 
ratings which depend solely on stand- 
ards within a group discussion should 
suffer in validity as estimated by cor- 
relations with real-life merit ratings. 
Rating techniques with this disad- 
vantage include the forced distribu- 
tion rating, paired comparison, rank- 
ing, and any others which force the 
equalization of all discussion-group 
L(,D score means and/or variances. 
Similarly, when tested among strang- 
ers, any ratings of each other by the 
participants themselves will be at- 
tenuated in validity as predictions of 
esteem, since they will depend solely 
upon standards vased on observation 


discussion correlated 


of a single discussion. 

This same analysis (17) indicated 
that a number of variables are associ- 
ated with the variation from discus- 
sion to discussion in the correlation be- 
tween real-life esteem of the members 
and their LGD performance. Accord- 
ing to a Doolittle solution, 
variables included: within-discussion 
variance in real-life esteem; within- 
discussion variance in LGD ratings; 
and group size. All relationships were 
positive except for size. Six-man 
groups were slightly more valid than 
larger ones as predictors of real-life 
esteem, 

Status differences among members 
almost completely invalidate the 
LGD as an indicator of real-life es- 
teem. Yet above and beyond these 


these 
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effects, the case history discussion 
appears more likely than other types 
of discussions to reflect differences 
in real-life esteem and personality. 
Where members of different rank 
were tested together, the case history 
discussion was the only one which 
yielded scores that correlated posi- 
tively with merit ratings as refinery 
supervisors (r=.28). Furthermore, 
case history discussion performance 
correlated .54 with a supervisory 
aptitude test battery. Other types of 
discussion, in these circumstances, 
averaged .23 in correlation with 
supervisory aptitude test scores, and 
correlated negatively with ratings of 
esteem of these supervisors of differ- 


ent rank (21). 


Personal Characteristics Associated 
with Leadership and LGD Per- 
formance 


According to Stogdill’s survey of 
over one hundred studies, leaders 
tend to surpass nonleaders in certain 
personal characteristics such as capac- 
ity (intelligence, alertness, verbal 
facility, originality, and judgment), 
achievement (scholarship and knowl- 
edge), responsibility and associated 
personality factors (dependability, ini- 
tiative, persistence, aggressiveness, 
self-confidence, desire to excel), and 
participation (activity, sociability, 
cooperation, and adaptability) (63). 
If these characteristics can be in- 
cluded under the broad concept of 
“abilities to solve group problems,” 
then their relationship to leadership 
can be deduced as well (13). Perform- 
ance in the LGD should be associ- 
ated with these personal factors, if it 
is to be judged valid as a measure of 
individual differences in tendency to 
exhibit successful leadership — be- 
havior. Table 4 shows the correla- 
tions obtained between various meas- 
ures of capacity and/or achievement 
and LGD performance. Table 5 
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CORRELATION BETWEEN LGD PERFORMANCE AND MEASURES OF CAPACITY 


sorority members (23) | 


fraternity pledges (72) 

sales and management 
trainee 
(10) 

sorority members (23) 


fraternity pledges (72) | 


ROTC cadets* 


administrator candi- 


dates (68) 
oil refinery 
sors (22) 


supervi- 


foreign service candi- 
dates (69) 
administrator 
dates (69) 
foreign service candi- 
dates (69) 
administrator 
dates (69) 
sorority members (23) 
oil refinery 
sors (22) 
R¢ ) | Cc cadets 


candi- 


candi- 


supervi- 


sorority members (23) 


refinery 
sors (22) 
mixed college students 
(64) 


oil supervi- 


TABLE 4 


AND ACHIEVEMENT 


| Test of Capacity or 
Achievement 


“E Linguistic 
“E Linguistic 


candidates | 


| ACE Quantitative 
ACE Quantitative 
AC > Total 


| 


| 


| 
| 


| South African Air 


| Force test of men 
tal alertness 
Otis Gamma 


| Cognitive Test Bat- 
| tery 
| Cognitive 
tery 
| Civil Service Qualify- 
ing Exam 
Civil Service Qualify- 
ing Exam 


Test Bat- 


} . . 
Years of education 


Years of education 


Average grade in col- 
lege 
' . 
Average grade in col- 


| lege 


Factor(s) Most 
Probably Measured 
by Test or 
Measurement 


] 
| 


Corre- 
lation 


Verbal aptitude 
Verbal aptitude 
Verbal aptitude 


Numerical aptitude 

Numerical aptitude 

Verbal and numerical 
aptitude (intelli- 
gence) 

Intelligence 


Verbal, spatial, numer- 
ical, aptitude (intel- 
ligence) 

Intelligence 


Intelligence 


Intelligence and scho- 
lastic achievement 
Intelligence and scho- 
lastic achievement 
Scholastic achievement 
Intelligence and scho- 
lastic achievement 


Scholastic achievement 


~ . . | 
| Supervisory aptitude 


test 
How Supervise? 


Scholastic achievement 


Supervisory aptitude 





Supervisory knowledge 


.35 
32 


5 


Unpublished data 
Contaminated by the 
Not affected by holding rank constant. 


shows the correlations between LGD 
performance and various personality 
variables that approximate the ‘“re- 
sponsibility’ and ‘‘participation”’ 
clusters of Stogdill. The first three 
items of Table 7 may be regarded as 
further evidence of the correlation 
between participation and LGD per- 
formance. We shall briefly consider, 
in turn, the correlations between 


correlation between rank and test of measurerm 


rent. 


LGD performance and capacity, 
achievement, responsibility, and par- 
ticipation. 

Capacity and leaderless group dis- 
cussion performance. While the cor- 
relation between LGD performance 
and verbal aptitude averages .30 
or above, as might be expected, the 
correlation LGD behavior 
and numerical aptitude is below .20. 


between 
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Intelligence test scores measuring 
verbal, numerical, and spatial fac- 
tors tend to fall in between in cor- 
relation with LGD performance. 
Ability to solve the leaderless dis- 
cussion group's problems appears to 
depend more on verbal than on other 
aptitude factors. 

A factor analysis of 14 leadership 
and ability measures by the author 
and Coates'® found that while rated 
performance in initially leaderless dis- 
cussions correlated close to zero with 
one factor, Ability in Active Situa- 
tions, it correlated .44 with another 
factor, Ability in Verbal Situations. 
Sakoda's (62) analysis of OSS data 
also noted that discussion ratings 
fell into a cluster with other ratings 
based on “verbal” situations. Car- 
ter, Haythorn, and Howell (30) ar- 
rived at the same conclusion follow- 
ing factorial analyses of several cri- 
teria of leadership. These findings 


may indicate the boundaries to the 


range of real-life situations in which 
leadership behavior can be forecast 
by rated LGD performance. 

The 10 available correlations be- 
tween intelligence and associated 
aptitude test scores and success in 
the LGD appear consistent with ex- 
pectations. Yet, verbal aptitude only 
accounts for 10 to 15 per cent of the 
variance in LGD, and so cannot be 
regarded as a more easily adminis- 
tered substitute for predicting success 
as a leader. 

Achievement and leaderless group 
discussion performance. Six studies 
are available of the correlation be- 
tween LGD ratings and _ tested 
achievement, years of education, or 
grades in college. Again, the correla- 
tions are uniformly positive, ranging 
from .16 to .31, with a median of .25. 
If the unusually high correlation of 
.57 is ignored because it is contami- 


10 See footnote 7. 
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nated by the correlation of rank with 
both education and LGD success, the 
median becomes .20. 

Above and beyond the effects of 
rank, a correlation of .30 appears to 
exist between LGD performance and 
supervisory aptitude as measured by 
an optimally weighted battery of 
interest, biographical, and super- 
visory judgment tests. 

How Supervise?, a fairly widely 
used test of knowledge of principles of 
supervision, which aims to predict 
success as an industrial supervisor, 
correlates .46 with the LGD success 
of 66 college students (64). 

Personality and leaderless group dis- 
cussion performance. We shall now 
consider correlations obtained be- 
tween LGD performance and the 
specific personality traits found to be 
associated with leadership according 
to Stogdill. 

Stogdill cites five studies in which 
leaders are found to be more energetic 
than nonleaders. For the LGD, a 
correlation of .15 was found between 
general activity and energy as as- 
sessed by the Guilford-Zimmerman 
Temperament Survey for 76 sorority 
girls. A corresponding correlation of 
.12 was found for 66 college students 
(64). Also, LGD leaders were more 
often characterized in a Rorschach 
analysis as highly energetic, while 
LGD nonleaders more often were de- 
scribed as lazy or passive (18). 

Stogdill uncovered a large number 
of studies that found successful 
leadership associated with original- 
ity, soundness of judgment, and abil- 
ity to evaluate situations. Rorschach 
analysis characterized LGD leaders 
as strongly imaginative, strongly 
interested in details, and able to see 
the larger aspects of things, and LGD 
nonleaders as stereotyped or conven- 
tional in thoughts and perceptions, 
as unclear, plodding, and confused 
thinkers, and as unimaginative. A 
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study of 172 ROTC cadets found an 
eta of .30 between the F scale of 
authoritarianism and LGD perform- 
ance. Highly authoritarian—hence 
stereotyped and rigid—-personalities 
did extremely poorly in the discus- 
sion, while equalitarian—but not too 
equalitarian—cadets earned the high- 
est LGD scores. Among samples of 
100 and 67 of these same cadets, 
Pearson correlations of .32 and .33 
were found between LGD perform- 
ance and Thurstone’s Concealed 
Figures and Gestalt Completion tests 
—tests of perceptual flexibility." 

Self-assurance and absence of 
modesty were uniformly associated 
with leadership in 17 studies cited 
by Stogdill. Similarly, self-esteem 
as measured by 140 self-nominations 
for sorority leadership positions cor- 
related .29 with LGD performance. 
An analysis of interviews with nine 
LGD leaders and nine LGD non- 
leaders from a total of 140 subjects 
using the Who-Are-You technique 
(28) showed that compared with non- 
leaders, LGD leaders more frequently 
regard in a more favorable light them- 
selves, their effects on others, and 
other persons’ effects on them. 

Discrepancies between Stogdill’s 
conclusions concerning leadership in 
general and LGD personality corre- 
lations arise when we consider per- 
sonality characteristics such as emo- 
tional stability, sociability, ascend- 
ency, and responsibility. 

Eleven studies have found emo- 
tional stability to be associated with 
leadership, although the evidence is 
not uniformly positive (63).  Simi- 
larly, LGD performance correlated 
.20 and .17 with emotional stability 
as measured by the Guilford-Zimmer- 
man inventory (18, 64). Rorschach 
analysis likewise characterized more 
LGD leaders than nonleaders as 


1! See footnote 7. 
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emotionally stable. Overt social ad- 
justment based on peer ratings cor- 
related .28 with LGD performance. 
On the other hand, five correlations 
found between LGD performance and 
the inventoried traits of freedom 
from hypersensitivity and_ hostility 
ranged from —.40 to .29 with a medi- 
an of —.04 (18, 64). 

Responsibility was uniformly re- 
ported by Stogdill to be associated 
with leadership. But a correlation 
of —.29 was found for 47 ROTC 
cadets between LGD ratings and re- 
sponsibility as measured by the 
Gordon Personal Profile, while cor- 
relations of .19 and .26 were found 
in two analyses of the relations be- 
tween thoughtfulness as assessed by 
the Guilford-Zimmerman and LGD 
performance (18, 64). 

Evidence concerning the relation 
between extroversion, ascendency, 
and leadership in general is contra- 
dictory according to Stogdill (63). 
Ascendency as measured by the 
Guilford-Zimmerman Temperament 
Survey correlated .44 with LGD 
performance of 76 sorority girls (18) 
and .25 with LGD performance of 
mixed college students (64); but as- 
cendency as measured by the Gordon 
Personal Profile and the A-S Reaction 
Test correlated only .02 and —.02, 
respectively, with LGD performance. 

Sociability as measured by the 
Guilford-Zimmerman in two studies 
(18, 64) correlated .27 and .31, re- 
spectively, with LGD performance; 
similar correlations between cooper- 
ativeness and LGD success were .14 
and .13, while sociability as measured 
by the Gordon Personal Profile cor- 
related close to zero with LGD per- 
formance. 

Finally, while Stogdill found ambi- 
tion and desire to excel important 
attributes of leadership, an average 
correlation of —.05 was found be- 
tweeu LGD performance and ex- 
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CORRELATION BETWEEN LGD PERFORMANCE 


Subjec ts 


sorority members (18) 


mixed college students 
(64) 

sorority members 

highest and lowest 


on Lt iD (18) 


ROTC cadets (15) 


ROTC cadets* 
ROTC cadets* 
sorority members (23) 


members 
highest and lowest 
on LGD (18) 

sorority members (18) 


sorority 


mixed college students 


(64) 
sorority members (23) 
sorority members (18) 
mixed college students 
(64) 
ROTC cadets* 
sorority members (18) 
mixed college students 
(64) 


sorority members (18) 


mixed college students | 


(64) 


sorority members (18) 
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TABLE 5 


Personality Test or 
Measurement 


Guilford-Zimmerman 
Temperament Sur- 
vey—G 

Guilford-Zimmerman 
lemperament Sur- 
vey—G 

Rorschach 


UCPOC F scale 
Concealed Figures 
Gestalt Completion 
for 
leadership positions 


Self-nominations 


in sorority 


W-A-Y interview 


Guilford-Zimmerman 
lemperament Sur- 
vey—E 

Guilford-Zimmerman 
Temperament Sur- 
vey—E 

Peer ratings 

Guilford-Zimmerman 
Temperament Sur- 
vey F 

Guilford-Zimmerman 
Temperament Sur- 
vey—F 

Gordon Personal Pro- 
file—H 

Guilford-Zimmerman 
Temperament Sur- 
vey—-O 

Guilford-Zimmerman 
Temperament Sur- 
vey—O 

Guilford-Zimmerman 
Temperament Sur- 
vey —P 

Guilford-Zimmerman 
Temperament Sur- 
vey——P 

Guilford-Zimmerman 
Temperament Sur- 
vey—S 


AND PERSONALITY TESTS OR MEASUREMENTS 


Corre- 
lation 


Trait Probably | 
Measured 


General activity and 


energy 
General and 
energy 


activity 


Imagination, strongly 
interested in details, 
able to see larger as- 
pects of things vs. 
conventionality, ster- 
eotypy, etc. 

Authoritarianism, __ri- 
gidity 

Perceptual flexibility 

Perceptual flexibility 


Self-esteem 


Self-esteem 


Emotional stability 


Emotional stability 


Overt socialadjustment 
Friendliness, agreeable- 
ness 


Friendliness, agreeable- 
ness 

Freedom from hyper- 
sensitivity 

Freedom from hyper- 


sensitivity 
Freedom from hyper- 
sensitivity 
Cooperativeness 


Cooperativeness 


Sociability 
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TABLE 5—Continued 





Subjects 





Mixed college 
dents (64) 


ROTC cadets* 
ROTC cadets* 


sorority members (18) 


mixed college students 
(64) 


sorority members (18) 


mixed college students 


(64) 
ROTC cadets* 


mixed college students 
(64) 
sorority members (23) 


uncoached college stu 
dents* 

coached 
dents* 

uncoached college stu- 
dents* 

coat hed college 
dents* 

uncoached college stu- 
dents* 


college stu- 


stu- 


coached college stu- 
dents* 
coached college 
dents* 
coached college 
dents* 
coached 
dents* 


stu- 
stu- 


college stu- 


sorority members (18) 
mixed college students 


(04) 


* Unpublished data 


+ Curvilinear relationship. 
Highly authoritarian subjects perform worst « 


subjects. 





Personality Test or 
Measurement 


Guilford-Zimmerman 
Temperament Sur- 
vey—S 

Gordon Personal Pro- 
file—S 

Gordon Personal Pro- 
file—R 

Guilford-Zimmerman 
Temperament Sur- 
vey—T 

Guilford-Zimmerman 
Temperament Sur- 
vey——-T 

Guilford-Zimmerman 
Temperament Sur- 
vey—A 

Guilford-Zimmerman 
femperament Sur- 
vey—A 

Gordon Person il Pro- 
file \ 

A-S Reaction Test 

Ratings of motiva- 

tion for various of- 

fices 


Check list descriptions 
Check list descriptions 
Check list descriptions 
Check list descriptions 
Kerr Empathy Test 
Kerr Empathy Test 


Dymond Empathy 
Test (modihed) 
Dymond Empathy 
Test (modified) 
Dymond Empathy 
Test (modified) 


Guilford-Zimmerman 
Temperament Sur- 
vey —M 

Guilford-Zimmerman 
Temperament Sur- 
vey—M 


n LGD 


J 





Maximum LGD success experienced by equalitarian 


| Sociability 


Trait Probably 
Measured 





Sociability 
Responsibility 


Thoughtfulness, re- 
flectiveness 


‘houghtfulness, re- 
flectiveness 


Ascendency 


Ascendency 


\scendency 
Ascendency 


Motivation to lead 


Parental initiation 
Parental initiation 
Parental consideration 
Parental consideration 
Social knowledge 
Social knowledge 
Accuracy of estimated 
self-ratings of others 
Accuracy of estimated 
group ratings of self 
Accuracy of estimated 


group ratings of 
others 


Masculinity-femininity | 


Masculinity-femininity | 


Corre- 
lation 


31 


07 


but not too equalitarian— 


t Chi-square analysis suggested that a significant positive correlation existed between the self-esteem of inter 
viewees and their LGD ecores. 





484 


pressed desire to hold sorority or 
university student offices (23). 

It may be inferred that some con- 
sistency exists between Stogdill’s 
generalizations of the relations be- 
tween leadership in general and such 
personality traits as energy, flexibil- 
ity of judgment, and self-esteem, and 
the relations between these traits and 
LGD performance. Contradiction or 
lack of uniformity appears when we 
consider such traits as responsibility, 
emotional stability, ascendency, and 
sociability." 

Participation and LGD perform- 
ance. According to Stogdill, leaders, 


Part of this lack of uniformity may be 
due to variations among techniques used to 
measure the various personality traits, and 
variations in the sex composition of the sam- 
ples studied. 

The responsibility, ascendency, sociability, 
and freedom from hypersensitivity scales of 
the Gordon Personal Profile, a forced-choice 
personality inventory, administered to sam- 
ples of men only, tended to correlate zero or 
negatively with LGD success. Corresponding 
Guilford-Zimmerman scales with the same or 
similar names—of the traditional self-report 
type—tended to correlate positively with the 
LGD success of a sample of women only (18) 
and with the LGD success of mixed men and 
women (64). 

Assuming that Gordon's forced-choice pro- 
cedure is less subject to distortion than the 
Guilford-Zimmerman, and assuming that the 
measurement techniques rather than the sam- 
ples were a significant source of variance, one 
might speculate that LGD performance tends 
to be associated with a participant's concept 
of himself, but only where the participant is 
free to distort the description to suit himself 

To go one step further, it may be that LGD 
success is related to the way a participant 
likes to see himself, but not to the way he 
actually sees himself when forced to make dis- 
criminations over which he has less control. 

Neither of these self-inventoried evaluations 
actually meets the requirements of the orig- 
inal hypothesis that the more able member is 
more likely to be esteemed and to be success- 
ful as a leader. The crucial evaluations of 
ability for leadership and esteem are those 
based not on self-evaluation, but on other 
group members’ judgments of the participant, 
or on objective tests of personality (as con- 
trasted with inventories). 
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in general, tend to be more talkative, 
more industrious, and more likely to 
participate in group activities. The 
same appears to be true of high 
scorers on the LGD who, compared 
with low scorers, were found to make 
1.5 times as many responses in a 
personal interview, and to give 1.6 
as many responses to the Rorschach 
(18). LGD success also has been 
found, for 140 sorority members, to 
correlate .36 with the number of 
university student leadership posi- 
tions held per semester, and .10 
with the number of sorority positions 
held per semester (23). 

Parental leadership and LGD per- 
formance. It is expected that child- 
hood’ experiences and memories of 
them should play a significant role 
in adolescent or adult leadership be- 
havior (13). In an unpublished study, 
the author hypothesized that LGD 
performance would be a function of 
the participants’ perception of how 


they had been led by their parents. 
Examinees used modified Ohio State 
Leadership Studies behavior check 
lists (38) to describe the extent to 
which their mothers and fathers initi- 
ated structure for them and were con- 


siderate of them. The average and 
range of correlations between such 
descriptions and LGD performance 
suggested that no consistent rela- 
tionship existed between parental 
descriptions and LGD performance. 

Empathy and LGD performance. 
A series of studies (e.g., 33, 67) has 
found a moderate relationship be- 
tween empathic ability and leader- 
ship. The results have not been uni- 
formly positive, mainly because the 
measure of empathy has varied great- 
ly from one investigation to another. 
Elsewhere (13), the author has de- 
duced that the more a person can 
accurately estimate the needs of 
others, the more likely he is to suc- 
cessfully lead others. Chowdry and 
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Newcomb (33) have come closest to 
testing this hypothesis, and have 
obtained positive results. 

Kerr and Dymond Empathy test 
data collected on a small sample 
(N’s=22 to 39) of students by 
Stolper (64) were correlated by the 
author with LGD performance 
scores, as shown in Table 5. The cor- 
relation of .36 between accuracy of es- 
timations of group ratings of self on the 
Dymond test and LGD performance 
was somewhat artifactual since LGD 
leaders probably led in the Dymond 
test also, and all members of a group 
are likely to agree more closely on 
leaders’ ratings compared to non- 
leaders, according to an earlier study 
by the author (6). 


Leadership Performance in Other 
Quasi-Real Situations and LGD 
Performance 


According to Fig. 1, leadership in 
other quasi-real situations is governed 
by the same individual factors as 
performance in the LGD. Therefore, 
the two should correlate fairly highly 
with each other if successful leader- 
ship is being measured in these other 
situations, and if ratings of LGD per- 
formance are valid as leadership rat- 
ings. Similar deductions can be 
made about LGD performance and 
success as a leader in real life." 

Table 6 summarizes the 17 avail- 
able correlations between leadership 
or “desirability for leadership posi- 
tions’”’ ratings based on LGD’s and 
other situational tests. The correla- 
tions are almost uniformly highly 
positive, but many are contaminated 
because the same raters were used to 
assess candidates during the LGD 
and the other situations. 

The median correlation between 
interview and LGD ratings is .70. 

18 The same alternative interpretation and 


rebuttal apply here as are presented in foot- 
note 6. 
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Where ratings are made by different 
raters in each situation, the correla- 
tion drops to .45 (9). 

Correlations of .37 and .67 were 
found between ratings based on as- 
signed leadership and leaderless dis- 
cussion (3, 59). Correlations between 
the LGD and other verbal situational 
tests, such as debates and commit- 
tee work, are almost as high as be- 
tween the LGD and its retest, aver- 
aging .70. As the other situations 
involve less discussion and more me- 
chanical or athletic activity, the me- 
dian correlation between the LGD 
and these other situational tests is 
reduced to around .52., 

Leadership ratings based on the 
leaderless group discussion alone 
correlated .64 with the final leader- 
ship ratings based on the entire OSS 
battery of situational tests (59). 


Leadership Performance in Real-Life 
Situations and LGD Performance 


Table 7 summarizes the correla- 
tions between LGD performance and 
real-life leadership performance." 
While LGD ratings correlate some- 
what with the tendency to hold lead- 
ership offices, they appear to be 
associated more with the tendency in 
real life to initiate structure and inter- 
action among associates and sub- 
ordinates (r=.32). On the contrary, 
a low negative correlation exists be- 
tween LGD performance and the 
tendency in real life to be considerate 


‘4 An attempt has been made in this article 
to treat separately from the more numerous 
studies of LGD performance correlated with 
merit ratings of real-life performance (Table 
3), studies of the LGD correlated with objec- 
tive indices of real-life leadership performance 
or fairly nonevaluative descriptions of real-life 
leadership performance. While usually em- 
pirically related, merit rating of performance 
is considered conceptually independent of 
actual performance or descriptions of per- 
formance. 
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TABLE 6 


CORRELATION BETWEEN PERFORMANCE IN OTHER QUASI-REAL SITUATIONS 
AND IN THE LGD 


Subjec ts 


. 223-442 OSS applicants (59) 


8. 223 
9. 223 


10. 223 
11. 40 


442 


$42 
442 
142 


foreign service candi- 
dates (69) 
administrator 
dates (69) 
management 
trainees (9) 
shoe factory 


candi- 
sales 


trainee 

executive candi- 
dates (65) 

OSS applicants (59) 

administrator candi- 
dates (3) 

OSS applicants (59) 

OSS applicants (59) 

OSS applicants (59) 

NROTC subjec ts (31) 


foreign service candi- 

dates (69) 
administrator candi- 
dates (69) 


| Other 


Other Quasi-Real 
Situational Tests 


Interview 


Interview 
Interview 


Interview with dif- 
ferent raters 


Interview and tests 
Assigned leadership 
Assigned leadership 


Debate 
Brook 


| Construction 


Leaderless Mechani- 
cal Assembly—4- 


man 


| Leaderless Mechani- 


cal Assembly—8- 
man 

Leaderless Group 
Reasoning Task 
4-man 

Leaderless Group 
Reasoning Task 
8-man 

Other situational 


tests such as com- 


mittee work 
situational 
tests such as com- 
mittee work 


Trait Measured 


| Leadership 


Desirability for job 
Desirability for job 
Desirability for job 


Desirability for job 


| Leadership 


“Personality and 


ability” 
Leadership 
Leadership 
Leadership 
Leadership 


Leadership 


Leadership 


Leadership 


Desirability for job 


Desirability for je yb 





Corre- 
lation 


48 
.75 


of the welfare of subordinates and 


associates (r= —.25).'5 


Studies of Factors Associated with 


LGD Performance 

Two factorial studies analyzing 
LGD test and retest scores in a cor- 
relation matrix of real-life, situational 
test, and psychological test meas- 
ures are available. The first (23) 
was based on 41 measurements of 
140 sorority girls: the second con- 
cerned 14 measures made of 66 to 244 


16 See footnote 7. 


ROTC cadets." Table 8 lists the 
loadings of the LGD test and retest 
on the factors that accounted for 
most of the variance of the LGD. 

In line with propositions stated 
earlier, ratings based on LGD per- 
formance appear to assess the extent 
to which an _ individual initiates 
structure or is socially bold in am- 
biguous situations. Esteem, or per- 
sonal worth, and verbal ability are 
also involved. On the basis of these 
studies it may be inferred that there 


1 See footnote 7. 
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TABLE 7 


CORRELATION BETWEEN LGD PERFORMANCE AND REAL-LIFE BEHAVIOR 








Subjects 





1. 140 sorority girls (23) 


2. 140 sorority girls (23) 


140 sorority girls (23) 


. 140 sorority girls (23) 
. 180 ROTC cadets* 
ROTC cadet officers* 


ROTC cadet officers* 


2 sales trainee candi- 
dates (9) 


Method of 
Measurement 


Real-Life Behavior 
Measured 





Extent of extracurricu- 
lar activities 

Number of university 
leadership 
held 

Number of sorority lead- 

| ership positions held 
Extent peers know her 
well enough to rate 

Final cadet rank achieved 





Ratings of subordinates 
and peer associates 
Ratings of subordinates 
and peer associates 


Number of leadership 
positions held previ- 
ously 








* Unpublished data. 

¢t Eta (curvilinear relationship) 

t Significant difference at 15 per 
perience as leaders. 


positions } 


Participation in social 
groups 

Real-life leadership per- 
formance 





| 
Real-life leadership per- 
formance 

| Visibility among associ- 
ates 

Future success as a lead- 
er 

Degree of ‘“‘considera- 
tion of subordinates” 

Degree of initiation of 

structure and interac- 

tion among associates 

and subordinates 

| Real-life leadership per- 

| formance 

| 





cent level between LGD score of experienced leaders and those 








without ex- 


exist three independent sources of 
variance underlying leaderless group 
discussion performance: 
I. Tendency to Initiate Structure 
II. Tendency to Be Esteemed or 
of Value to a Group 
III. Ability in Verbal Situations. 
Little specific variance is left when 
the common variance accounted for 
by these factors is extracted. 


SUMMARY 


The history, applicability, relia- 
bility, and validity of the leaderless 
group discussion as a means of assess- 
ing variations among persons in the 
tendency to exhibit successful leader- 
ship behavior have been considered. 
While the procedure was originated as 
a psychological technique in Ger- 
many over thirty years ago, it is 


Factor 


Il. 
HI. 
IV. 


Ascendency-Sociability 
Verbality 
Intellectualism 


I. Leadership Potential (Esteem) 


TABLE 8 


Correla- 
tion in 
Sorori- 

ties 
(23) 


| 
| 


> a 
.39 
) 
21 | 


ORTHOGONAL Factors CORRELATED WITH OvER-ALL LGD PERFORMANCE 


Factor 


I. Esteem 
II. Tendency to Initiate Structure 
| III. Ability in Verbal Situations 





* Unpublished data. 


Correla- 
tion in 
ROTC* 


.30 
a 
44 
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only in the last decade that system- 
atic reliability and validity studies 
have appeared. 

High interrater agreement and 
high test-retest reliabilities have been 
reported especially 
where descriptive behavior check 
lists have been used as the rating 
technique. 

Group size, length of testing time, 
type of problem presented, directions, 
seating arrangement, number of rat- 
and rating procedure influence 
to a greater extent per- 
formance in the LGD as well as the 
reliability and validity of LGD rat- 
ings. Studies of the effects of many 
of these have appeared since 1950. 

According to both deductive and 
inductive evidence, a valid assess- 
ment of the tendency to display 
successful leadership should correlate 
with: (a) status as measured by rank, 
when the assessment is based on per- 
formance among associates of differ- 
ent rank; (6) esteem in real life as esti- 
mated by merit ratings; (c) successful 
leadership performance in other 
quasi-real and real-life situations; (d) 
personal characteristics as measured 
by psychological tests, such as capac- 
ity, proficiency, responsibility, and 
participation. On the whole, ratings 
based on performance in the LGD 
tend to do this. Therefore, it is in- 
ferred that they have some validity 
as assessments of the tendency to 
leadership, i.e., 


consistently, 


ers, 


or lesser 


display successful 
leadership potential. 
Other evidence suggests that the 
successful leadership behavior ob- 
served in the LGD concerns primarily 
initiation of structure rather than con- 
sideration of the welfare of others. 
The preceding analysis, coupled 
with recommendations made by 
others working in the field of situa- 
tional tests, such as Weislogel (71), 
leads us to the following hypotheses: 
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1. To maximize the reliability and 
validity of the LGD and other situa- 
tional tests, scoring techniques should 
minimize reliance on the ability of 
observers to infer differences in per- 
sonality traits and future tendencies 
among examinees. Observers should 
merely report or evaluate the im- 
mediate behavior they observe. For 
example, in an unpublished study, 
the author found that two Army 
colonels’ estimates of the potential 
as Army officers of ROTC examinees 
were less valid as predictors of the 
merit ratings of the examinees than 
were the colonels’ check list descrip- 
tions of who initiated structure dur- 
ing the LGD. Similar results were 
noted in a study of fraternity mem- 
bers reported by the author and 
White (20). 

When the observer makes an infer- 
ence about the future behavior of an 
examinee from the observations of 
the examinee during the LGD, 
several potential errors are likely. 
The observer may err in deciding on 
which dimensions to make inferences; 
he may err in collating his observa- 
tions with the future behavior to be 
predicted; and finally, the dimen- 
sions on which the inferences are 
made may be private ones which 
cannot be shared with other obser- 
vers. The errors may be constant, 
variable, or both. Lack of knowl- 
edge and control over such errors 
disappears when raters are merely 
asked to describe what they observed 
and these descriptions are used as 
predictors. 

Further reduction of uncontrolled 
raters’ errors may be made in the 
following ways: 

a. Objective criteria for describing 
specific behaviors can be used (71). 
In the LGD, the actual number of 
times a participant suggests a new 
approach to a problem can be noted 
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instead of rating ‘‘to what extent did 
the participant suggest new ap- 
proaches to problems.” 

b. Forced-choice check lists can 
be used instead of present check lists. 
Otis’ (60) recent successful applica- 
tion of the forced-choice technique to 
interviewer ratings indicates promise 
for applying the same procedure to 
the LGD. 

2. To maximize validity, problems 
that are equally ambiguous to all 
participants, and that require the 
initiation of structure for their solu- 
tion, should be used. Where interest 
is in forecasting leader behavior in 
real life, the structure to be set up 
should approximate the real-life set- 
ting as much as possible. 

3. Since the LGD correlates fairly 
highly with most other intellectual 
or verbal situational tests, the use of 
many situational tests in a battery to 
forecast leadership potential is of 
doubtful utility. Thus, leadership 
ratings based on a one-hour LGD cor- 
related above .60 with 


leadership 
assessments based on the three days 
of OSS situational testing (59). A 
similar correlation between the LGD 
and an entire battery of situational 


tests was found by Vernon (69). 
However, a significant proportion of 
the variance in over-all potential as 
a successful leader, unaccounted for 
by LGD, may be predicted by a fairly 
pure active or mechanical, initially 
leaderless, situational test which min- 
imizes variance due to verbal ability. 

4. Compared to paper-and-pencil 
techniques, the LGD is expensive; 
compared to the individual interview, 
in many locales, it may prove eco- 
nomical. The LGD appears feasible 
administratively, especially in mili- 
tary programs screening OCS or ad- 
vanced ROTC applicants, in civil 
service examinations, in 
college seniors who are to be assessed 
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at their colleges for management 
trainee positions, and anywhere else 
where ‘“‘boards’” have been used 
traditionally, such as in the selection 
of public school teachers. 

5. While the LGD appears to have 
some validity as a predictor of the 
tendency to be a successful leader in 
a number of situations, especially in 
comparison to other assessment tech- 
niques, tailor-made batteries of paper- 
and-pencil will undoubtedly 
vield higher validities in designated 
situations. However, it may be that 
just as the brief intelligence test 
is applicable for predicting train- 
ability for many skilled occupa- 
tions, so the LGD will provide a 
general technique for partially as- 
sessing potential success as a leader 
in a relatively wide range of situa- 
tions. 

6. A number of situations in which 
the LGD is less likely to be success- 
ful may include the following: 

a. The LGD is less likely to be 
valid for measuring or forecasting 
esteem or leadership potential when 
examinees can be tested only among 
others of different rank. In such a 
case, status—and not esteem or per- 
sonality—will determine who 
ceeds in the LGD. 

b. The LGD is less likely to be 
useful where factors peculiar to the 
situation block initiation of structure 
where no structure exists. Conceiv- 
ably, in certain military settings for 
example, the examinees may be im- 
bued with the dictum “never volun- 
teer for anything.’” However, exactly 
how this would affect LGD validities 
is unknown. 

c. Another unknown is the effect 
of the average verbal aptitude and 
educational status of the participants 
on the validity and utility of the 
LGD. It is expected that where this 
mean falls below a certain minimum, 
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LGD forecasting efficiency may suf- 
fer.!7 

d. Since achievement and _ intelli- 
gence appear to correlate with LGD 
performance as well as with success 
as a leader in real life, any restriction 
in the range of intelligence or achieve- 
ment of LGD participants would be 
likely to reduce the forecasting effi- 
ciency of the LGD. Conversely, the 
yreater the variance in intelligence 
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THE PARADOX 


In recent years several authors 
have called attention to a paradox in 
test theory. Gulliksen (7) appears to 
have been the first. He showed that, 
under reasonable conditions, “In 
order to maximize the reliability and 
variance of the test, items should 
have high intercorrelations, all items 
should be of the same difficulty level, 
and this level should be as near to 
50% as possible.”” But, he continued, 
“The criterion of maximizing test 
variance cannot be pushed to ex- 
tremes. Test variance is a maximum 
if half of the population makes zero 
scores, and the other half makes per- 
fect scores. Such a score distribution 
is not desirable for obvious reasons, 
yet current test theory provides no 
rationale for rejecting such a score 
distribution” (7, pp. 90-91). 

All studies to be reviewed in this 
paper assume homogeneous tests, in 
the sense that all correlations be- 
tween items within a test are ac- 
counted for by a single common fac- 
tor. Validity, as used herein, refers to 
correlation of the test with that com- 
mon factor. While the studies are 
explicitly concerned with reliability, 
in this context reliability refers only 
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to degree of homogeneity. There is 
no concern with stability of functions 
over time. 

Tucker stated the paradox as fol- 
lows: “‘Consider the case when all 
the items in a test are equivalent; 
that is, when the items all measure 
the same trait, have equal reliabili- 
ties, and are equally difficult. In this 
case the items are equally intercor- 
related with coefficients equal to the 
item reliabilities .. . If the relia- 
bility of the items were increased to 
unity, all correlations between the 
items would also become unity and a 
person passing one item would pass 
all items and another failing one item 
would fail all items. Thus the only 


possible scores are a perfect one or one 


of zero” (19, pp. 1-2). He pointed 
out, as a consequence of this paradox, 
that under these circumstances in- 
creasing the reliability of a test be- 
yond a certain point will decrease the 
validity of the test, in contradiction 
to the usual belief, embodied in the 
correction for attenuation, that in- 
creasing the reliability of the test al- 
ways increases its validity. 

Brogden conceived of the problem 
as one of ‘determining the distribution 
of item difficulties which will maxi- 
mize the correlation of the test with a 
perfect measure of the characteristic 
the test is intended to measure” 
(1, p. 197). With perfectly valid 
items, he pointed out, the difficulties 
should be equally spaced in some 
sense, whereas with items that do not 
intercorrelate, the difficulties should 
all be as close to .5 as possible. Brog- 
den stated that item selection pro- 
cedures aimed solely at increasing the 
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reliability of the test may result in a 
decrease in validity, again, the “‘at- 
tenuation paradox.” 

Two writers who 
tenuation paradox were 
(11) and Cronbach (3). Loevinger 
called attention to the anomalous 
result of using equivalent items when 
the item intercorrelations approached 
unity and concluded that equivalent 
items were undesirable in the usual 
case. Since the phi coefficient and 
KR 20 (Kuder and Richardson's [10] 
formula 20) measure the departure of 
the items from equivalence as well 
as the interrelationships of the under- 
lying variables, she concluded that 
they were inappropriate as item- 
selection coefficients. 

Cronbach, in refuting Loevinger, 
stated: “The phi coefficient which 
tells when items do and do not dupli- 
cate each other is a better index 


missed the at- 
Loevinger 


just because it does not reach unity 
for items of unequal difficulty” (3, 


p. 329). Here Cronbach neglected 
the fact that if one uses the phi 
coefficient for item selection, one 
needs two rules: For lower values of 
phi, the higher the coefficient, the 
more will the two items contribute 
to the validity of the test. But for 
high values of phi, the lower the 
coefficient, the more will the two 
items contribute to the validity of the 
test. 

Similarly, Cronbach showed that 
the maximum value of KR 20 is not 
much less than unity for items with 
a specified distribution of item difh- 
culties, and that the maximum value 
will drop for a greater range of item 
difficulties. If, however, maximizing 
KR 20 is made a rule for item selec- 
tion, as Cronbach recommends, there 
will be a tendency to select items with 
a narrower range of difficulty, or, 
in Cronbach's terms, more redundant 
items. Here Cronbach apparently 
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failed to see the paradox that where- 
as maximizing KR 20 for constant 
number of items will lead to increas- 
ing the validity of tests where the 
item intercorrelations are low, it will 
lead to decreasing the validity of 
tests where the item intercorrelations 
are high. 

The resolution of the attenuation 
paradox thus lies in having two rules 
for test construction. For the “‘classi- 
cal region,” the region in which the 
attenuation of validity decreases 
with increase in reliability, the closer 
the items are to difficulty of .5 and 
thus to equivalence, the more reliable 
and more valid will the test be. For 
the “region of paradox”’ the optimal 
distribution of item difficulties must 
be determined as a function of item 
intercorrelations. ‘This solution was 
implied by Brogden and stated by 
Davis (5). 


DEFINITION OF THE REGION 
OF PARADOX 


Four studies have contributed to 
the definition of the region of para- 
dox, those of Tucker (19), Brogden 
(1), Davis (5), and Cronbach and 
Warrington (4). 

Tucker (19) assumed that ability 
or true score is normally distributed, 
that probability of success on any 
item is related to true score by the 
normal ogive, and that all items are 
equally difficult. ile found that with 
median equivalent items, i.e., equiva- 
lent items of difficulty equal to .5, 
the validity of the test constantly in- 
creased as the item reliability in- 
creased for a one-item test. For a 10- 
item test optimal item reliability 
(interitem correlation) was about .5, 
and for a 100-item test optimal item 
reliability was about .25. The maxi- 
mum validity was .8 for a one-item 
test or for a test with perfectly reli- 
able items, however many, since the 
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latter case is equivalent to that of 
one item. Maximum validity was .97 
with 100 items and slightly over .9 
with 10 items. 

Tucker investigated a single case 
of non-median equivalent items. All 
items were of such difficulty that the 
ability level where the probability of 
passing is .5 was one standard devia- 
tion above the mean. (It will be 
convenient to refer to this case as 
tests for which z=1. Results are iden- 
tical for z=—1, that is, for tests 
where all items are of such difficulty 
that the ability level where half the 
individuals the item is one 
standard deviation below the mean.) 
In this case the optimal interitem 
correlation for a 10-item test was 
about .3, for a 100-item test about 
15. The corresponding validities 
were about .83 and .96. For a one- 
item test the validity again increased 
as a function of item reliability to a 
maximum value of about .66. This 
value is also the terminal validity 
for a test of any number of items as 
the item reliability approaches unity, 
given that all items are of the speci- 
fied difficulty level. 

The optimal values of item reli- 
abilities under his conditions were 
surprisingly low, Tucker pointed out; 
however, he cautioned that exceeding 
these values caused less decrement in 
validity than falling short of them. 
Further, where item difficulties are 
not all equal, his results do not hold 
exactly. 

Brogden (1) assumed that the true 
score of ability was normally dis- 
tributed, that the tetrachoric cor- 
relations of all pairs of items within 
a test were equal, and the biserial r 
of each item with true score was the 
same for all items in a test. Phi co- 
efficients between items and point 
biserials between items and true score 
were not always constant within a 
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test because items were permitted to 
differ in difficulty. Validity coefficient 
of the test was studied as a function 
of tetrachoric r of items, number of 
items, and distribution of item diffi- 
culties. Item inter-r’s had the values 
.2, .4, .6, and .8. Numbers of items 
were 9, 18, 45, 90, and 153. All item 
difficulties were concentrated at half 
sigma units between and including 2.0 
and —2.0. The types of distributions 
were rectilinear, normal, skewed, and 
constant at each of the sigma values. 

Some of Brogden’s results are dis- 
played in Fig. 1, 2, and 3. Figure 1 
shows validity as a function of item 
intercorrelation for tests composed of 
median equivalent items, that is, 
items for which z=0. The four curves 
correspond to four values for n, the 
number of items. Figure 2 is similar 
to Fig. 1, except that all items are 
of difficulty <=1. (The curves of 
Fig. 2 also apply if all items are of 
difficulty z=—1.) While Fig. 2 
again shows the attenuation paradox 
as a function of number of items, 
comparison of Fig. 1 and 2 shows the 
attenuation paradox as a function of 
item difficulty. Figure 3 shows valid- 
ity as a function of distribution of 
item difficulties for tests composed of 
90 items. The curve for z=0 is also 
a member of another family of curves 
shown in Fig. 3. These curves show 
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POSED OF MEDIAN EQUIVALENT ITEMS. DATA 
FROM BROGDEN'’s (1) TABLE 2. 
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Fic. 2. ATTENUATION PARADOX AS A FuUNC- 
TION OF NUMBER OF ITEMS FOR Tests Com- 
POSED OF EQUIVALENT [rEMS OF DIFFICULTY 
DATA FROM BROGDEN’S (1) TABLE 3. 
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z=1. 


what happens to a test of 90 equiva- 
lent items as the difficulty departs 
from median value. The data for 
these graphs are taken from Brog- 
den’s Tables 2 and 3. 

Reading across the three figures, 
one sees that, in general, validity 
increases with increasing item inter-r 
up to a point and then decreases. 
The only curve which continues to 
rise in these figures is that for rec- 
tangular distribution of item diffi- 
culties. For a large number of non- 
median equivalent items, the optimal 
item inter-r is apparently less than 
.2. Paradox is exhibited whenever 
validity decreases as reliability (item 
inter-r) increases, that is, whenever 
the curve slopes downward.  Brog- 
den’s method, unlike Tucker's, does 
not permit ascertaining exactly the 
optimal value of item inter-r, so the 
computed have been con- 


points 
nected by straight lines rather than 
attempting to sketch a curve. Tuck- 


er’s article contains actual 
which may be compared with the 
figures here. 

Reading up and down Fig. 1 and 
2, one sees that validity always in- 
creases with increasing number of 
items, provided item inter-r and dif- 
ficulty distribution are held constant; 
however, the increase in validity ob- 
tained by increasing m becomes small- 
er as the item inter-r increases. 


curves 


Figures 1 and 2 show that optimal 
item inter-r decreases as m increases 
for constant item difficulty; in other 
words, the region of paradox in- 
creases as m increases. Comparison 
of Fig. 1 and 2 illustrates that valid- 
ity drops as item difhculty departs 
from the median value for constant 
n and item inter-r. The same com- 
parison illustrates that for tests com- 
posed of equivalent items, the region 
of paradox increases as the item diffi- 
culty departs from the median. 
(Support for these generalizations is 
contained in further data not repro- 
duced here.) 

The two highest curves of Fig. 3 
show an important manifestation of 
the attenuation paradox: Median 
equivalent items produce tests of 
higher validity for low values of 
item inter-r, while a rectangular dis- 
tribution of difficulty produces tests 
of higher validity for high values of 
item inter-r. The curve for normal 
distribution of item difficulties lies 
between that for rectangular distri- 
bution and that for z=0 but was 
omitted to make the figure more legi- 
ble. The point at which rectangular 
distribution produces higher validity 
than median equivalent items may 
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be taken as defining the limits of the 
region of paradox. Utilizing data 
from Brogden’s Table 2 shows the 
following: Higher validity was ob- 
tained by tests of median equivalent 
items than by tests with distributed 
difficulties for item inter-r’s of .2 and 
4; however, for r of .4 and large num- 
ber of items (90 or 153), there was 
almost no difference in validity for 
tests with median equivalent items 
and with item difficulties dis- 
tributed normally or _ rectilinearly. 
For item inter-r equal to .6, the test 
with median equivalent items was 
superior for a 9-item test, ran about 
equal with tests of rectilinear and 
normal distribution of item difficulties 
for 18- and 45-item tests, and ran 
slightly behind those with distributed 
difficulties for 90 and 153 items. For 


tests 


item inter-r’s equal to .8, distributed 
difficulties were clearly superior to 


equivalent items, more so as the num- 
ber of items increased. The rectilin- 
ear had a slight advantage over the 
normal distribution for with 
more than 18 items. 

The curve in Fig. 3 for skewed dis- 
tribution of item difficulties falls far 
below the curves for rectangular, 
normal, and median equivalent items. 
This decrement is chiefly a result not 
of skewness but of the departure of 
mean item difficulty from z=0. The 
rectangular and normal distributions 
do have means at z=0, whereas for 
the skewed distribution the mean 
(computed by the reviewer) is at 
z=1.21. 

For item inter-r, where 
tests are composed of items of con- 
stant difficulty, the validity falls off 
fairly sharply with the departure of 
that difficulty from the median; more- 
over, the greater the departure of the 
difficulty from the median, the lower 
the optimal value of the item inter-r. 
Comparing now the curve for skewed 
difficulties, which corresponds to a 


tests 


a given 
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mean s=1.21, with tests of constant 
difficulty equal to 1 and to 1.5, one 
obtains at least a hint that distribu- 
tion of item difficulties protects the 
test from decrement of validity due to 
departure of difficulty from the medi- 
an, at least for moderate and high 
values of item inter-r. More striking 
is the fact that despite the distribu- 
tion of item difficulties, validity con- 
stantly falls with increasing item 
inter-r within the range considered, 
when the mean difficulty of the items 
is not appropriate for the group tested 
and the number of items is large. 
Practical considerations often lead 
to the use of a test with a group whose 
mean ability is not exactly the same 
as the mean difficulty of the items. 
Thus this important effect deserves 
much more extended investigation. 

Davis (5) assumed that the most 
desirable distribution of test scores 
is a rectangular one and sought the 
conditions under which such a dis- 
tribution could be obtained. He con- 
cluded that where the tetrachoric 
item inter-r’s are .5 or less, the clos- 
est approach to a rectangular dis- 
tribution will be obtained with medi- 
an equivalent items. As the item 
inter-r rises above .5, the item diffi- 
culties should be increasingly dis- 
persed. In comparing Davis’ result 
with those of Tucker and Brogden, 
note that the latter investigators as- 
sumed that the true scores were nor- 
mally distributed. Validity coeffi- 
cients thus could equal unity only if 
obtained scores also were normally 
distributed. In effect, then, Tucker 
and Brogden assumed a normal dis- 
tribution as a desideratum while 
Davis assumed a rectangular distri- 
bution.’ 

Cronbach and Warrington (4) as- 
sumed that the probability of success 

’ Davis’ derivation, obtained from him in 


mimeographed form, appears to be somewhat 
less rigorous than the other two. 
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on any item is described by the nor- 
mal ogive; however, in contrast to 
the preceding studies, in which per- 
sons with no ability were assumed to 
have zero probability of passing the 
item, they dealt only with three- 
choice items. Thus persons with no 
ability had a 4 chance of passing 
any item. The standard deviation of 
the ogive, gz, was the same for all 
items within a test. Most of the tests 
had 30 items. The underlying ability 
was assumed to be normally distrib- 
uted. The scale value of each item 
is that level of ability for which the 
probability of passing the item is 3 
(or, if for chance, }$). 
Items were located at scale values 
between 2.5 and —2.0, inclusive, at 
intervals of .5. Pattern A had 30 
items at scale value 0; pattern B 
had 6 at .5, 18 at 0, and 6 at —.5: 
pattern C had 6 each at 1.0 to —1.0 
inclusive; and pattern D had 3 each 
at 2.5 to —2.0 inclusive. Cronbach 
and Warrington's chief problem was 
outside the realm of the attenuation 
paradox. They were concerned with 
the screening efficiency of the tests 
for various possible cutting scores 
from zero to over 90%. 
Screening efficiency was measured by 


corrected 


close to 


rye Of the dichotomized score scale 
against the continuous, normally dis- 
tributed scale of ability. For each 
test a series of such validity coeffi- 
cients was obtained and its curve 
plotted. 


For perfectly precise items (¢4=0) 
the validity of pattern A was great- 


est only in the neighborhood of a 
cutting score of .5, and fell sharply 
below that of other patterns as the 
cutting score deviated from the medi- 
an point. The range of cutting scores 
for which pattern A was superior 
increased as precision decreased un- 
til, for values of og21, pattern A 
was superior throughout most of the 
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range. A somewhat similar family of 
curves was generated by holding pat- 
tern constant and letting item pre- 
cision vary; that is, the greater the 
item precision, the more valid the test 
for cutting scores near the median, 
but the less valid for extreme cutting 
scores. This tendency was particular- 
ly clear for pattern A, which they 
called a ‘‘peaked”’ test, and is a 
manifestation of the attenuation par- 
adox. 

Cronbach and Warrington pro- 
posed an interesting integration of 
their findings. They suggested that 
increasing the variance of the scale 
values of the items (¢,?) has about the 
same effect as increasing the variance 
of the item ogive (¢,7), and that the 
important quantity to be considered 
sum of the two variances. 
Using eta rather than r as a measure 
of over-all validity, they found that 
under their conditions validity in- 
creased as a function of ¢/7+¢,? to 
a maximum where the sum of the 
variances was about .5, then slowly 
declined. 

At one point they stated, “‘Insofar 
as we can judge from these and Tuck- 
er’s results, the peaked test has supe- 
rior validity for even lower values of 
o. (higher 7,;) when the test is longer 
than thirty items” (4, p. 146). Cron- 
bach and Warrington’s data on the 
effect of test length are scanty, but 
the data of Tucker and Brogden are 
clear, and opposite to what Cron- 
bach and Warrington indicate: The 
longer the test, the greater the region 
of paradox. (See Fig. 1 and 2.) 

An incidental finding of Cronbach 
and Warrington’'s study was that the 
peaked tests tended to have bimodal 
or markedly skewed distributions, 
while the tests with dispersed dif- 
ficulties had more nearly normal dis- 
tributions. They did not consider 
the possibility that this finding is 


is the 





THE 


itself a manifestation of the attenua- 
tion paradox. As reason for using 
fos they stated, “Unlike product- 
moment 7, fi, is independent of the 
test score’ metric. Our results are 
therefore invariant as test scores are 
transformed to other scales”’ (4, p. 
132). But, one may ask, what opera- 
tions correspond to transforming test 
scores to other scales? Given their 
conditions, only their scale results. 
The effect of various changes in 
conditions would have to be investi- 
gated. Indeed, it is generally con- 
sidered a requirement for use of fois 
that the dichotomized variable be 
normally distributed. The effect of 
using fi, and eta as opposed to r 


apparently is to increase the ad- 


vantage of peaked as opposed to 
distributed item difficulties, and thus 
in effect to underestimate the region 
of paradox. 

In summary, the region of paradox 


is most clearly defined in the studies 
of Brogden and Tucker. Neither 
Davis’ study nor that of Cronbach 
and Warrington detected that the 
region of paradox increases with the 
number of test items, an effect clearly 
discernible in the two previous stud- 
ies. If Cronbach and Warrington 
had used the product-moment rf in- 
stead of eta they apparently would 
have obtained results comparable to 
those of Brogden and Tucker. Cron- 
bach and Warrington called attention 
to the importance of considering 
items where there is a nonnegligible 
probability of success without abil- 
itv. Their method appears better 
adapted to further exploration of this 
problem than other methods, pref- 
erably with r used as evaluation 
function. A valuable extension of 
their study would include two-choice 
and four-choice items as well as ones 
with no probability of success without 
ability. Brogden’s method has pro- 
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vided the most detailed information 
concerning the limits of the region of 
paradox and is well adapted to further 
investigation of the relative merits 
of concentrated and distributed diff- 
culties under circumstances where a 
test may be used for groups differing 
in mean ability. Cronbach and 
Warrington’s method could also be 
used for the latter problem. Tucker's 
paper, being analytic, is likely to 
prove the most substantial contribu- 
tion to test theory of any reviewed 
herein. 


APPLICATION TO TEST 
CONSTRUCTION 


The statistical phenomena sum- 
marized by the phrase ‘‘the attenua- 
tion paradox” reveal the importance 
of the mean and variability of the 
standardization sample in the evalua- 
tion of a test’s validity. Reading up 
and down Fig. 3, one can interpret 
the curves for differing values of z 
as showing a striking decrement in 
validity for tests composed of median 
equivalent items when they are 
subsequently applied to groups hav- 
ing a different mean on the trait 
measured. Similarly, the validity co- 
efficients obtained for a hypothetical 
set of tests with decreasing disper- 
sion of item difficulties correspond 
to those obtained by considering the 
items as constant but administered 
to groups of increasing variability. 
Generally speaking, the standardiza- 
tion sample should have the same 
mean and variance as the group on 
which the test will ultimately be 
used. In practical situations usually 
no such identity can be guaranteed, 
certainly not for the life expectancy 
of a well-constructed test. 

For tests in the low homogeneity 
or “classical region,”’ as tie variance 
of the sample increases, mean held 
constant, the validity coefficient in- 
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Thus, to obtain a lower 
bound to the validity coefficient, the 
standardization sample should be at 
most as variable as any sample on 
which the test will be used. 

For tests in the “region of para- 
dox” the situation is more compli- 
cated. For a given degree of item 
intercorrelation and a given number 
of items, there is probably an optimal 
distribution of item difficulties. The 
higher the item inter-r, the more it 
is true that each item added to the 
test adds validity only if it differen- 
tiates at a different level of difficulty 
than those items already included. 
Any considerable increase in the vari- 


creases. 


ability of the group will decrease the 
validity of the test. Thus, to obtain 
a lower bound for the validity co- 
efficient, the standardization sample 


should be at least as variable as any 
sample on which the test will be used. 

Variation in the mean of the appli- 
cation sample from that of the stand- 
ardization group must also be con- 
sidered. Cronbach and Warrington 
concluded that where general use- 
fulness over a period of time is sought 
items should be concentrated around 
median difficulty; however, they con- 
sidered tests as selective instruments, 
which is legitimate but different from 
the present consideration of tests as 
measuring instruments, and their use 
of biserial correlation is question- 
able. Whether concentration of item 
difficulties makes a test more 
ceptible to loss of validity in applica- 
tion to groups of different means is 
a point on which evidence is not 
yet available. 

Theoretical and practical consider- 
ations may be drawn 
follows: When application to a single 
group is considered, concentration of 
item difficulties is called for in the 
case of item inter-r’s usually met, 
which are low. Consideration of use 


Sus- 


together as 
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of tests for differing groups will lead 
to dispersion of item difficulties for 
some cases, 1.e., where item inter-r’s 
are not too low. In practice, however, 
it is usually not possible to find large 
numbers of items exactly at the 50-50 
point. The consequent inevitable dis- 
persion of difficulties is in many 
cases probably desirable. A method 
of item selection that favors median 
items but does not exclude good items 
with more extreme difficulties seems 
indicated by considerations. 
One such method has been proposed 
recently (13), and no doubt other 
methods have this property. That 
widely used method of item 
selection has a very different effect 
will be shown in the next section. 


these 


one 


IMPLICATIONS FOR TEST THEORY 


Consider the definition: A para- 
doxical property of a test is a property 
such that the validity of the test is 
not a monotonic function of that 
property; i.e., validity is sometimes 
an increasing and sometimes a de- 
creasing function of the property. 

The objections to taking validity 
as the focal concept for test theory 
are well known. Probably all psycho- 
metricians would that 
theory must concepts 
which refer to intrinsic properties of 
not intuitively valid, 
however, to demand that the most 
basic concept of psychometrics shall 
be a nonparadoxical property of tests? 
Are there such nonparadoxical prop- 
erties? 

Clearly reliability, or one of its 
cognate the focus of 
present-day test theory; the statisti- 
cal theory of reliability is the bulk of 
test theory. There have 
been many criticisms of the reliabil- 
ity concept; those by Thorndike (17), 
Cronbach (2), and Loevinger (11) 
have a good deal in common, particu- 


agree test 


have basic 


tests. Is it 


concepts, is 


classical 
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larly the distinction between stabil- 
ity and homogeneity, which is lost in 
the ordinary usage of “reliability.” 
The purpose of the present article 
is not, however, to review all the 
objections to the reliability concept 
but to draw attention to a single in- 
stance in which the statistical theory 
of reliability leads to self-contradic- 
tion. 

Such solutions and explanations of 
the attenuation paradox as have been 
proposed, those of Brogden, Davis, 
and Cronbach and Warrington, are 
appeals to common sense completely 
outside traditional theory of relia- 
bility. Perhaps the most paradoxi- 
cal aspect of the attenuation paradox 
is that Gulliksen, who appears to 
deserve credit for discovering it, 
failed to include any reference to it 
in his comprehensive summary (8) 
of mental test theory. In his own 
words, ‘“‘Current test theory provides 
no rationale for rejecting’’ the anom- 
alous score distribution. 

Lord (14) has probably come clos- 
est to integrating the attenuation 
paradox with classical test theory. 
He points out that while Pearsonian 
correlation between test and 
ability decreases in what is_ here 
called the region of paradox, the 
curvilinear correlation with ability 


score 


constantly increases with increasing 
item inter-r. Unfortunately, however, 
consideration of curvilinear correla- 


tion obscures the paradox, which 
does in fact exist. Lord concerns 
himself chiefly with the classical re- 
gion; it is not clear that his approach 
could lead to the distinction between 
the two regions of test theory. 
Reliability is paradoxical; satura- 
tion, a concept imbedded in a method 
of test construction recently proposed 
by Loevinger, Gleser, and DuBois 
(13), is virtually identical with one 
of the Kuder-Richardson (10) 


co- 
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efficients and thus is paradoxical in 
the same region. Guttman’s (9) con- 
cept of scalability and Loevinger’s 
(11) concept of homogeneity clearly 
differ from reliability and saturation. 
Are these properties paradoxical? 
Consideration of the results quoted 
in the second section of the present 
paper shows that these properties are 
also paradoxical, not, however, in 
what is here called the region of 
paradox but in the classical region. 
To distinguish the two kinds of para- 
doxical properties, scalability and 
homogeneity may be called neo- 
paradoxical properties. Only scale 
theory will be considered. Its im- 
portance is pointed up by the fact 
that virtually all recent contributions 
to test theory in sociology and closely 
related disciplines are phrased in 
terms of scale analysis. 

Guttman originally measured scal- 
ability in terms of a “coefficient of 
reproducibility,” which is the pro- 
portion of the responses of the group 
which can be reproduced from knowl- 
edge of item difficulties and total 
Guttman distinguished 
scales’”’ from ‘“‘quasi-scales’’ accord- 
ing to whether the coefficient of 
reproducibility exceeded .9, assuming 
certain other conditions were satis- 
fied. Festinger (6), Loevinger (12), 
and probably others criticized the 
distinction between scales and quasi- 
scales as arbitrary. According to 
these writers, Guttman and his fol- 
lowers were erecting a qualitative 
difference out of a purely quantitative 
one. The considerations of the pres- 
ent paper, however, justify what was 
apparently a purely intuitive dis- 
tinction on the part of Guttman. 
When we deal with cumulative (12) 
dichotomous items, which is the usual 
case (16), the distinction between the 
two regions of test theory can be 
clearly, if laboriously, drawn. What 


scores. 
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Guttman called quasi-scales are tests 
in what is here called the classical 
region; what Guttman called scales 
are tests in what is here called the 
region of paradox. There is no evi- 
dence, however, that workers in the 
field of scale analysis are aware. of 
the precision with which this dis- 
tinction can be made, nor of the con- 
sequences of the distinction which 
have been elaborated by psycholo- 
gists working in test theory. 

A number of objections have been 
raised against the original coefficient 
of reproducibility, a major one being 
that its lower limit can be arbitrarily 
raised by selecting items extreme with 
respect to difficulty (or popularity). 
In practice, workers in this field have 
probably always more or less taken 
this fact into consideration. A recent 


paper by Menzel (15) has proposed a 
modification of the coefficient of re- 
producibility which eliminates this 
problem. 


Clearly, insofar as use of 
Gsuttman’s coefficient led to the selec- 
tion of items extreme in difficulty as 
opposed to median items, it led to a 
decrease in validity of the resultant 
test, since under none of the condi- 
tions investigated above did a selec- 
tion of extreme rather than median 
items lead to an increase in validity. 

Stouffer, Borgatta, Hays, and 
Henry (16) report that, in practice, 
in order to derive scales rather than 
quasi-scales from available data, it 
has been necessary to select one from 
among several apparently equally 
good items at any given difficulty 
level. According to them, probably 
the most common method of im- 
proving scalability has been reduction 
of number of items per test, often to 
no more than four or five items. 
Inspection of Fig. 1 and 2 and other 
available data reveals no set of con- 
ditions under which reduction in 
number of items increases validity; 
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on the contrary, increase in number 
of items invariably increases validity, 
most markedly when the number of 
items is small. 

A further consequence of using the 
concept of perfect scales as the de- 
sideratum in test construction has 
been that test constructors have been 
forced to select items that are more 
or less evenly spread in difficulty. 
Such selection will tend to increase 
scalability, but in the classical region 
it will tend to decrease validity. 

In summary, Guttman's coeffi- 
cient of reproducibility gives advan- 
tage to extreme as opposed to median 
items. If this coefficient is modified 
so as to give advantage neither to 
extreme nor to median items, pursuit 
of scalability still leads to decrease 
in number of items and to dispersion 
of item difficulties. But the evidence 
cited in this review is that decrease 
in number of items always leads to 
decrease in validity, other things 
equal; there is no set of conditions 
under which extreme items lead to 
more valid tests than median items; 
and in the classical region, dispersion 
of item difficulties leads to decrease 
in validity. ‘There is reason to be- 
lieve that most of the attitude tests 
currently in use fall in the classical 
region. One may conclude that ex- 
tensive use of scale analysis has al- 
most certainly led to loss of validity 
in tests used in sociology. 

In recognition of the difficulties 
to which scale analysis has led, more 
or less akin to those cited here, Stouf- 
fer, et al. (16) have proposed a modifi- 
cation of the method. They continue 
to have about five items per scale, 
but each item is a “contrived item.” 
A contrived item is a composite of 
several items similar in level of diffi- 
culty, but each contrived item is 
scored zero or one, depending on the 
number of pluses in the items of 
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which it is composed. The contrived 
items are constructed by a cut-and- 
try method. The authors show that 
tests constructed by their method are 
superior to those constructed by the 
more common method of scale analy- 
sis with respect to scalability and 
some definition of reliability, but 
only where the original items are 
quite good. Their method appears 
to be a compromise between scale 
analysis and methods favoring choice 
of equivalent items, to which most 
psychologists would lean. While the 
new method may ameliorate the 
flaws of scale analysis, it does not 
solve the conceptual difficulty. Scal- 
ability, like reliability, is a paradoxi- 
cal concept. Worse, scalability is 
paradoxical in the classical region, 
which is heavily populated with 
tests, while coefficients such as the 
Kuder-Richardson (10) reliability co- 
efficients are paradoxical only in the 
region of paradox, which is thinly 
populated with tests.‘ 

The problem of finding nonpara- 
doxical properties of tests remains. 
A property different from those dis- 
cussed above which may qualify is 
the discriminating power of the test. 
Many writers have used this term 
without definition, as if it were self- 
explanatory. Several recent papers 
have provided indices of discriminat- 


4 Dr. Ledyard Tucker has, however, called 
my attention to the fact that a problem being 
worked on by John Keats at Princeton Uni- 
versity provides a mathematical model for 
the method of “contrived items.” Keats as- 
sumes that the probability of success on an 
item increases with ability at only one point 
on the scale of ability, rather than describing 
an ogive. It is not assumed that the probabil- 
ity of success on the item is zero below that 
point nor unity above it, only that it is con- 
stant everywhere but at the single point. 
One has difficulty thinking of test content for 
which this assumption is as reasonable as the 
assumption of a graded increase in the prob- 
ability of success on items. 
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ing power; full consideration of these 
indices would lead far afield. A single 
approach will be selected arbitrarily 
to show that this concept differs 
from concepts like reliability and scal- 
ability. 

Loevinger, Gleser, and DuBois (13) 
have distinguished three aspects of 
discriminating power: fineness, prob- 
ability, and range. Discriminating 
fineness refers to the size of the dif- 
ferences in the trait which can be 
discriminated. Discriminating prob- 
ability refers to the proportion of 
discriminations which are in the 
same direction as trait differences. 
Discriminating range refers to the 
general level of the trait at which dis- 
criminations are made. No single co- 
efficient measures the three aspects 
of discriminating power; however, 
some coefficients are closely related 
to discriminating fineness, some to 
discriminating probability, and some 
to both. If the test construction 
process is conceived as_ beginning 
with the best few items in a finite 
pool and adding items one at a time 
from the pool in the order of the 
goodness of the items, then coeffi- 
cients measuring only fineness con- 
stantly measuring 
only probability constantly decrease, 
while those measuring both may in- 
crease at first and then decrease. 
The latter type of coefficient. pro- 
vides a basis for deciding when to 
stop adding items to the test. 

This definition of discriminating 
power may prove unappealing to 
many psychometricians, 
single 


increase, those 


since no 
coefficient corresponds to it 
and since it is essentially an intuitive 
rather than a quantitative concept. 
Yet other quantitative disciplines 
begin frankly with intuitive concepts. 
As Ruth Tolman (18) has recently 
observed, among physical scientists 
it is the highest compliment to speak 
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of a fellow scientist as having ‘“‘physi- 
cal intuition,” but among those psy- 
chologists who most desire to emulate 
the physical scientists, intuition is 
rarely referred to in a similar compli- 
mentary sense. 

In summary, paradoxical property 
of a test is a property such that valid- 
ity is not a monotonic function of that 


JANE LOEVINGER 


property. Reliability is a property 
paradoxical in a region lightly popu- 
lated with tests. Scalability is a 


property paradoxical in the region 
heavily populated with tests. Other 
possible properties, such as the dis- 
criminating power of the test, have 
not been fully investigated. 
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REGRESSION ANALYSIS: PREDICTION FROM 
CLASSIFIED VARIABLES! 


ROBERT M. GUION 
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Occasionally in psychological re- 
search a qualitative independent var- 
iable seems promising in the predic- 
tion of a quantitative variable. Some- 
times, particularly in applied re- 
search, data for basically continuous 
independent variables must be ob- 
tained, if at all, in terms of broad, 
discrete categories, possibly of un- 
equal intervals. Such variables are 
at best inconvenient to work with 
in prediction problems; all too often, 
the difficulties they present are solved 
by ignoring them, resulting in the 
loss of a potentially useful predictor. 

In a study of supervisors (2), 


the writer was faced with this prob- 


lem. It was necessary to develop a 
regression equation by which a de- 
pendent variable (number of employ- 
ees supervised) could be predicted 
from such independent variables as 
man-hours of assistance, rough classi- 
fications of plant size, and the purely 
qualitative variable of industry clas- 
sification. The problem was solved by 
a technique of regression analysis. 
Regression analysis is a_ least- 
regression technique per- 
mitting the prediction of a quantita- 
tive variable from categorized or 
qualitative variables.2, There seems 


squares 


! This article describes a procedure used in 
a thesis submitted to the Graduate School of 
Purdue University in partial fulfillment of the 
requirements of the degree of Doctor of 
Philosophy, carried out under the direction 
of Dr. C. H. Lawshe, and sponsored by the 
Purdue Research Foundation. 

2 The theory and procedure presented here 
under the name of regression analysis were 
developed primarily by Professor I. W. Burr 
of the Statistical Laboratory of Purdue Uni- 


to be no particular limit to the num- 
ber of variables that can be handled 
by this method, as there is in the 
usual multiple regression techniques. 
Apparently the absence of products 
in the resulting regression equation 
reduces the likelihood of cumulating 
to a serious degree the errors of meas- 
urement. It is, of course, possible 
that spurious correlation might re- 
sult from including too many vari- 
ables. 

It is beyond the intent of this re- 
port to give a complete account of 
the theory and practice of regression 
analysis. The procedure, which has 
been used in agricultural research, is 
capable of further refinement and 
needs evaluative research. <A de- 
scription of the basic theory and of 
the method's application in agricul- 
ture is provided by Anderson and 
Bancroft (1). This report seeks mere- 
ly to outline the procedure as modi- 
fied in the writer's use of it. 

The technique will be discussed and 
outlined by example, using a three- 
variable situation that might occur 
in personnel research. For purposes 
of illustration, we will be concerned 
with predicting the average number 
of working days attendance per 
quarter year of machine operators. 
The prediction will be based upon a 
knowledge of the applicant’s cate- 
gorical standing in each of three 
variables: 


It was subsequently learned that 
procedures based upon the same 
have been applied to 


versity. 
similar 
mathematical theory 
biometric data. 
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Variable A: Preference for mechanical 
work 


1. First or second choice 
2. Third choice or not indicated 


Variable B: Housing status 


1. Own home 
2. Rent home 
Rooming 


3. 
Variable C: Presently employed? 


1. Yes 
2. N 


AsO 


There are seven categories in all 
in this example; any applicant may 
be classified as being in three of them. 
Regression analysis seeks to estimate 
eight parameters: a, a for the two 
categories of variable A; 8), Bs, 83 for 
the three categories of variable B; 
i, ¥2 for the two categories of vari- 
able C; and yu for the mean attendance 
(dependent variable) of the general 
population of applicants. The sum 
of the parameters for the categories 
describing any individual, when 
added to the population mean, yields 
the dependent measure for that indi- 
vidual within limits of error, such 
that the regression model is 


x= a; + By + ye + ut error. 


From a sample available for study, 
we can these parameters 
(estimates indicated by italic letters) 
so that 


estimate 


x=a+b;+ a+ m. 


In other words, an adjustment fac- 
tor is assigned to each category of 
each variable. For any individual 
applicant in the illustrative situa- 
tion, the predicted attendance record 
would be equal, within limits of error, 
to the mean attendance record of the 
total sample plus the algebraic sum 
of the adjustment factors of the cate- 
gories that describe him. 


M 


GUION 


A least-squares solution of these 
parameter estimates is sought by 
using a system of eight simultaneous 
equations (the number of categories 
plus one) by the following procedure: 

1. Prepare the matrix of the system 
of equations. The equation for the 
first category of the first variable is 


> ie * MQ — MNa,)b,O1 — Moy) bye 


~ May\by0a — Mayte,Cr 


Nas\ex2 — Nam = 0. 


~X,, is the sum of the measures of 
the dependent variable (i.e., attend- 
ance in days) for all cases appearing 
in the first category of variable A: 
those who have indicated on the 
application blank a first or second 
choice for mechanical work. The 
frequency, or number of cases, ap- 
pearing in this category is designated 
na, The notation m,,\s, indicates the 
number of in this category 
which are also classified in the first 
category of variable B: those who 
own their homes. The next six equa- 
tions follow this form, being equa- 
tions for the second category of 
variable A, the first category of 
variable B, and so on. 

The final equation is for the general 
mean and is 


cases 


N 
> Xi- 


tml 


No @, — Magd2 — Nob, — no,be 


— Mv,b3 — nN... — Nm = 0. 

Transposing all parameter esti- 
mates and their coefficients yields a 
matrix equal to the vector of the de- 
pendent variable. This 8 by 8 matrix 
of independent variables consists of 
a set of frequencies as coefficients of 
the unknown parameter estimates. 
The complete system of equations, in 
matrix form blocked off for illustra- 
tive purposes, is shown as Table 1. 

It will be seen that the total fre- 


n-,C2 — 
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quency in any given category appears 
in the diagonal, and that the coefh- 
cients of the diagonal are equal to the 
coefficients of the last equation for 
the general mean. 

The sample used in this illustra- 
tion consists of 49 machine operators, 
26 of whom have indicated machine 
work either as their first choice or 
(first category, variable A). 
Of these 26 men, 7 own their homes, 6 
are renting, and 13 are rooming or 
living with their parents (the three 
categories of variable B, respective- 
ly). They totaled 1,417 
days attendance during a 
month period. 

Blocking the matrix off as shown 
in Table 1 provides a convenient 
check on the accuracy of the tabula- 
tions made: the sum of the frequen- 
cies in any given block must be equal 
to the total number of cases in the 
sample, unless there are individuals 
included for whom complete data 
not available. In any column 
within a block, the sum of the co- 
efficients is equal to the coefficient 


second 


working 
three- 


are 


in that column in the equation for 
the general mean. 

2. Reduce the matrix. This system 
now stands, is 


additive since the sum of the coeffi- 


of equations, as it 


cients of the equations for each vari- 
able yields the appropriate coeff- 
cients for the equation for the general 
mean. This set of equations does 
not have full rank and therefore has 
no unique solution. 

A solution can be obtained by 
making the restriction that the sum 
of the parameters, or a 
weighted sum, be equal to zero for 
any given variable. With this re- 
striction, the process of reparametri- 
zation, discussed by Kempthorne (3), 
can be applied. Reparametrization 
seeks to replace certain parameters 
and solve for a different set. It can 


estimated 
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be done by setting one parameter in 
each variable equal to zero, which 
results in each case in the elimination 
from the matrix of one row and one 
column through the diagonal ele- 
ment of that parameter.’ In the il- 
lustrative case, eliminating the last 
category of each variable (coefficients 
in Table 1, italicized), we have the 
following reduced 5 by 5 matrix of 
equations to solve instead of the origi- 


nal 8 by 8: 

26 7 6 11 26 1417 
10 13 | 743 
842 


1260 


7 3 66 
6 0 15 6 15 
4 62006=«C«d6siasCSaBS 


26 13 15 23 49 2700 


This reduced matrix can be solved.* 
The solutions for the illustrative 
problem are shown in Table 2. 

3. Solve the equations. 
of course, many methods of solving 


There are, 


sets of simultaneous equations. How- 
ever, in systems as large as will be 
found in regression 
analysis, loss of significant figures in 
the course of computation will be- 
come a serious problem if direct 
reduction or a determinant method 
In the supervisory study 


most cases of 


is used. 
cited, for a matrix of 27 equations, 
the Gauss-Seidel iterative method 
was used. The 134 iterations required 
to reach a solution were performed on 
the IBM Card-Programmed Calcula- 


tor using a procedure described by 
Liggett (4). 


* The simplification of reparametrization is 
largely the result of work done by L. E. 
Grosh, T. E. Cheatham, and Professor V. L. 
Anderson of the Statistical 
Purdue University. 

4 Recognition should be given Raymond 
Woods, Department of Mathematics, Bowling 
Green State University, for his labor in solving 
the problem used here for purposes of illustra 
tion. 
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TABLE 1 
MATRIX OF COEFFICIENTS OF THE UNKNOWN PARAMETER ESTIMATES AND 
VECTOR OF DEPENDENT VARIABLE 


Categorie 


by 


‘ariable \ 
( ategory 
Category 
ariable B 
ategory 
ategory Z 
ategory 
iable C 
ategory l 
ategory 2 5 11 
General Mean | 4 23 


Italicized values are those elir 


4. Convert the obtained values. The 
after reparametrization 
yields for the illustration the values 


Ss lutie n 


a,’, by’, be’, o’, and m’, since the last 
of each variable had been 
set equal to zero. Conversion of 
to estimates of the 


category 


these estimates 
original set of parameters is 
complished by simple algebra, the 


ac- 


actual process depending upon the 
restriction first One of 
three restrictions have 
made: 


imposed. 


might been 


1. The sum of the estimated param- 
eters (or adjustment factors) can be 


TABLE 2 


SOLUTIONS TO ILLUSTRATIVE EQUATIONS 


l’arameter 
Estimate 


_— Prime 
arameter Solution* 

048 —0).492 
a, +0.556 
db, 918 t 2 689 
by 020 +0.791 
by —2.229 
ray — 1.865 —0.990 
C +0.875 


m +54.316 +55.102 


* Solution to reparametrized matrix 


Coefficients of Parameters 


PAP m ’ Vector 
sY 
bs | | (rX) 


| C1 


11 
12 


21 


7 
14 


21 


ninated in reparametrization 


set equal to zero, if the population 
can be assumed to be equally dis- 
tributed among the various categories 
of the variable. 

2. The sum of the estimated param- 
eters of each variable, w 
cording to the observed frequencies 


ighted ac- 


of the categories, can be set equal to 
zero, if it is assumed that the sample 
is random. 

3. The sum of the estimated param- 
eters of variable, given a 
priori weights on the basis of known 
or hypothesized population frequen- 


each 


cies, can be set equal to zero. 

The second restriction seems more 
common and will be used in the exam- 
ple. Following Kempthorne (3), it 
can be shown that a,;’+a,=a,. It 
is therefore necessary to find the kth 
value for each variable, i.e., the value 
eliminated in reparametrization. The 


k 
restriction is that }-n,,a;=0, or, in 
t 


the example where variable B has 
3. 

three categories, >> m,bi = 0. Sub- 
| 

stituting, this becomes in full 


my, (b,' + bs) + mo, (b2’ +3) + no,b3=0 





REGRESSION ANALYSIS 


Simplifying and solving for bs, 
ramet ny,b1" + mo,be’ . 
N 
or, in the illustration, 
(13)(4.918) + (15)(3.020) 
ae ~—— 





— 2.229, 
and, 
b, = b,’ +b; = 4.918 — 2.229 = 2.689 
be = be’ + b3 = 3.020 — 2.229 = 0.791. 


The same general procedure is used 
to obtain the correct parameter esti- 
mates, or adjustment factors, for 
variables A and C, shown in Table 2. 
For the general mean, it is best to 
use the simple arithmetic mean as 
normally computed because of the 
rounding errors which creep into the 
corrected value after the solution 
through reparametrization. In this 
problem, the mean is 55.102. 

With these values known, the linear 
equation 


x=a@+b;+a+m 


can be used for each applicant. In 


the illustration, we can now, by 
knowing the applicant’s classification 
in the three variables and by knowing 
the mean attendance of the sample, 
predict the applicant’s job perform- 
ance in terms of expected attendance 
per quarter. 

For example, an applicant who lists 
mechanical work as his first job 
(category a,, —0.492), who 
owns his home (category 5;, +2.689), 
and who is presently employed (cate- 
gory 4, —0.990) would be expected 
to have a quarterly attendance rec- 
ord, in integral units, of 56 working 
days, the prediction formula being 


x = $5.102 — 0.492 + 2.689 — 0.990 
56.309. 


choice 
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The relative influence of each inde- 
pendent variable can be gauged ac- 
cording to the variance of the param- 
eter estimates, such that, in this 
problem, variable B carries a heavier 
weight in prediction than variable C, 
which is in turn weighted more heav- 
ily than variable A. 

It should be pointed out that this 
problem has been illustrative only; 
where the units are discrete rather 
than continuous, which is true of 
both the frequencies of individuals 
in various categories and of the meas- 
urements of attendance, there is 
probably very little justification in 
attempting to achieve greater than 
integral accuracy. 


CONCLUDING COMMENTS 


Regression least- 
squares multiple regression technique 
permitting the prediction of quanti- 
tative variables from qualitative or 
classified variables. As outlined here, 
regression analysis assumes no signif- 
icant interactions between variables. 
If interaction is found to exist, ad- 
ditional equations can be introduced, 
treating combinations of interacting 
variables as separate classifications. 
it should be pointed out, for the sake 
of economy, that any error made by 
a false assumption of no interaction 
is an error of underestimating rather 
than the relation- 
ship; the effect is one of ignoring addi- 
tional variables which could have in- 
creased the predictive efficiency. 

The regression equation derived 
from this method does not directly 
provide certain interpretational data. 
However, the predicted values of the 
dependent measure can be correlated 
by conventional procedures with the 
actual values. These correlation co- 
efficients can then be used to derive 
coeficients of determination and 
standard errors of estimate. 

The potential uses of the method 


analysis is a 


overestimating 
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are many. Perhaps the most obvious 
are its applications to personal his- 
tory analysis and to profile analysis. 
Nevertheless, it should be made the 
subject of empirical investigations. 
Several research questions need an- 


Its predictive effectiveness 


swering. 


REFERE 


Bancrorr, T. A 
New York 


ANDERSON, R. L., & 
Statistical theory in research 
McGraw-Hill, 1952 

Gution, R. M. The employee load of first 
line supervisor: I Psychol., 
1953, 6, 223-244 

KEMPTHORNY, O. Design and analysts of 
experiments. New York: Wiley, 1952 

Liccetrt, I. ¢ 


ersonne 


[wo applications of the 


ROBERT M. GUION 


in comparison with other proce- 
dures, such as the Horizontal Per Cent 
Method (5, p. 256), needs to be de- 
termined, as does the effect of coarse 
grouping or of the number of vari- 
ables used. 


NCES 


IBM card-programmed electronic cal- 
culator. Proc. Industr. Computations 
Seminar, September, 1950, 62-65. 
Steap, W. H., Suarte, C. L., & Assoct- 
ATES Occupational counseling  tech- 
niques ; their development and application. 
New York: American Book Co., 1940 


Recewwed September 16, 1953. 





PSYCHOLOGICAL BULLETIN 
Vol. 51, No 5, 1954 


ESTIMATING THE SCALABILITY OF A SERIES OF 
ITEMS—AN APPLICATION OF 
INFORMATION THEORY 
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The powerful scaling technique 
developed by Guttman and others 
(3, 4, 5, 10) has several advantages, 
some of which seem to be unique. 
The merits and limitations of the 
methed have been evaluated else- 
where (1, 2, 6), and they will not be 
reviewed here. The present paper 
will be restricted to a consideration of 
the problem of estimating the degree 
of scalability present in the area in 
question; i.e., the degree to which 
items in the scale are measuring the 
same thing in the population mem- 
bers. 

Guttman discusses several factors 
which should be considered in this 
connection, namely, (a) the number 
of items, (b) the number of respond- 
ents, or size of the population sam- 
ple, (c) the number of errors, or re- 
sponses which do not fit the pattern 
of a perfect scale, (d) the distribution 
of these nonfitting responses, (e) the 
number of response categories of 
the items, and (f) the distribution of 
the item marginals. The first three 
of these criteria are used to compute 
a coefficient of reproducibility =r, 
which is equal to 1— pg, where pg is 
the proportion of errors among the 
total number of responses. This 
coefficient should equal or exceed 
.90. The fourth factor is determined 
by an inspection of the response 
pattern. Nonfitting responses should 
be well scattered, indicating a ran- 
dom distribution, rather than appear 
in clusters, which would indicate a 
systematic distortion of the scale 
pattern. The last two criteria are ac- 


counted for by rule-of-thumb pro- 
cedures. The more response cate- 
gories given to the items individually, 
and hence the larger total number of 
response categories, the less likely is 
a scale pattern to appear by chance 
alone. It is recommended that the 
number of items be ten or more if 
they are all dichotomous, so that the 
number of response categories is at 
least twenty. Likewise it is recom- 
mended that there be included no 
more than a few items that have 
almost all responses lumped under a 
single alternative, as such items in- 
evitably boost the coefficient of re- 
producibility. For example, an item 
to which 95 per cent of the respond- 
ents answer “‘agree’’ could not possi- 
bly result in more than 5 per cent 
nonfitting responses, while an item 
with a 50-50 split between two cate- 
gories could theoretically result in 50 
per cent errors. Items with more than 
two alternatives might produce even 
at best more than 50 per cent non- 
fitting responses if not removed from 
the series. 

Because of the effects which the 
number of categories and the distri- 
bution of category frequencies have 
on reproducibility coefficients, they 
are not strictly comparable as gener- 
ally computed. A means of quantita- 
tively accounting for these effects 
would be especially useful because 
it is standard practice to administer 
each item with several categories, 
and then to combine adjacent cate- 
gories in an attempt to get r up to 
.90. And, of course, the category 
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frequencies are changed at the same 
time. Another common practice in 
which these effects play an important 
part is that of selecting a subgroup of 
items from among those adminis- 
tered. In both cases the manipula- 
tion is selective, favoring the result 
sought by the experimenter, and 
should be corrected for. 

The concept of entropy' has been 
applied to the theory of information 
by Wiener (11) and by Shannon (9). 
This interesting concept provides the 
means for such a correction. The en- 
tropy // associated with a contingent 
event is an indication of the number 
of possible outcomes it can have. It 
is consequently a measure of the un- 
certainty that the event will occur 
in any specified one of these possible 
ways. The toss of a die, with six 
equally likely possible outcomes, in- 
volves more entropy than the toss of 
a coin, with only two equally likely 
outcomes possible. Knowing the 
outcome of a high-entropy situation 
gives us more information, in the 
sense that more uncertainty is re- 
moved, than knowing the outcome of 
a low-entropy situation. 

How does this apply to estimating 
the degree of scalability present in an 
area? If we make our estimation 
first under one set of conditions and 
then make a second estimation under 
a different set of conditions having 
a different associated entropy value, 
the two estimates are not comparable 
unless we can allow for this difference. 
This is true whether we have tested 
in the same area twice or in two com- 
pletely different areas. The situation 
is somewhat analogous to comparing 


! [t has been so named because of its mathe- 
matical and conceptual similarity to the term 
entropy used in statistical mechanics, which in 
turn was named after the entropy of classical 
thermodynamics. For another application of 
entropy-like considerations in psychology, see 
Miller and Frick (7). 
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two chi-square values without con- 
sidering a difference in the degrees of 
freedom. Let us imagine, for exam- 
ple, that we have given ten trichot- 
omous items to 100 respondents. 
Then we arrange the responses so 
that they form as nearly as possible a 
scale pattern, using any of the avail- 
able methods (3, 5, 10). Say the co- 
efficient of reproducibility r turns 
out to be .80. This does not meet the 
accepted standard of .90, and we 
begin to judiciously combine adjacent 
categories so that we are left with ten 
dichotomous items and an r of .90. 
Is this really any stronger evidence 
that the area is satisfactorily scalable? 
The proportion of responses which 
do not fit the scale pattern has been 
cut in half, but perhaps the restruc- 
turing of the situation has reduced 
the likelihood of such nonfitting re- 
sponses by as much or more. The 
more we manipulate the situation to 
our advantage, the less chance there 
will be for errors to show themselves. 
Our judicious manipulation reduces 
the entropy and imposes greater re- 
strictions on the responses, forcing 
them to more nearly fit the scale pat- 
tern. 

To see how much this chance for 
errors to appear has been reduced, 
let us calculate entropy values for 
conditions before and after combining 
categories. The formula is 


H=- > pi loge pi, (1) 
t=l 


in which 9; is the relative probability 
of the event « occurring.? As ?; will 
always be less than 1, if there is any 
uncertainty whatever about the out- 
come, logep; will be negative and H 

? Shannon (9) discusses the characteristics 
which a good indicator of uncertainty should 
have and shows that this is the only form of 


expression which does have these character- 
istics. 
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will be positive. The logarithm is 
taken to a base of 2 only because // 
then takes on a convenient numeri- 
cal value—it then indicates how 
many times the number of equally 
likely possibilities must be halved 
to remove all uncertainty about the 
outcome. An H value of 1.00 would 
represent an event such as the toss 
of a coin, in which the equally likely 
possibilities are reduced from 2 to 1 
when the outcome becomes known. 
This same /7 value would also char- 
acterize a single respondent checking 
a dichotomous item, assuming we 
have no a priori knowledge about 
which category he is more likely to 
endorse, 1.e., assuming ~p; = p2=.5. In 
either the case of the coin or the single 
respondent answering a single di- 
chotomous item, formula 1 reduces to 


H = — .5 loge .5 — .5 loge .5 
= — 5(— 1)— .5(— 1) = 1.00. 


But in our imaginary example above, 
we had 100 respondents instead of 
only one, and so we need not assume 
that all the presented choices for 
an item are equally likely to be chos- 
en. We obtain an estimate of the 
relative probabilities from the re- 
sponse frequencies, and the larger 
the group, the better will be our esti- 
mates. The response frequencies give 
us a set of p,'s for each item such that 
~pi=1.00; using these p,'s, we can 
calculate an J// value for each item 
in the series. Then we sum item //'s, 
and the result is an H value char- 
acteristic of our series of items. This 
value indicates how much informa- 
tion is obtained (i.e., how much un- 
certainty is removed) each time we 
administer our item series to a re- 
spondent. The formula may be 
written 


> 
H = — D0 Dd pas loge pis, (2) 


jul inl 
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where pi; is the relative probability 
of category 1 on item 7 being en- 
dorsed, J is the number of items in 
the series, and c,; is the number of 
response categories presented with 
item 7. The summation is first over 
categories within items and then over 
items. Note that this formula does 
not take into consideration the pat- 
terning of the responses. It is the 
coefficient of reproducibility which 
does this, and that is why the two 
measures must be considered together 
in order to see the whole picture. 
Returning once more to our exam- 
ple, suppose we compute the entropy 
before, J/,, and after, 7/2, combining 
categories with the result that /7,/J/, 
=2. Then the nonfitting responses 
have been reduced only in proportion 
to the reduction in entropy. We 
should conclude that the increase in 
r is most likely due to sheer manipu- 
lation. It should be pointed out here 
that increased reproducibility after 
combining categories can mean more 
than this. Such would be the case if 
respondents were differentiating be- 
tween the first category on an item 
and the other two according to their 
true scale position, but were distin- 
guishing between the second and 
third categories only by chance, or 


by habits of expression, or by any 


other extraneous factor. But before 
we could conclude that this sort of 
thing was happening, it seems reason- 
able that we should require that the 
errors, or nonfitting responses, be 
reduced as much as or more than the 
entropy, so that 


pe,/ pe, = H;/ Hs (3) 


where pe, and pe, are the relative 
proportion of errors before and after 
combining categories, respectively. 
Lowering /] means a greater structur- 
ing of the situation, and as we have 
been deliberately structuring it in 
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our favor, we must demand not only 
a reduction in errors, but a reduction 
at least in proportion to this increased 
structuring. 

The proportion in formula 3 may 
also be written as 


Hs/ pre, = 1/pe,, 


and this suggests the possibility of 
setting up a standard J//peg ratio 
with which to compare // to pg values 
Taking into 
consideration the values of both r 
and /I/pg would decrease the possi- 
bility of accepting as scalable an 
area which is not. It might also oc- 
casionally provide the evidence 
needed to accept as scalable an area 
about which we would have other- 
wise been in doubt. Choosing a 
standard ratio, call it H,/pe,, is 
largely a matter of deciding how con- 
servative we wish to be. It seems 
logical to choose pg,=.1, as_ this 
corresponds to an r of .90. We have 
a rough guide in choosing JJ, in Gutt- 
man’s statement (10, p. 79) that at 
least ten items should be used if 
they are all dichotomies. If we as- 
sume that the p,’s are normally dis- 
tributed, so that each p, is associ- 
ated with an equal portion of the 
area under the probability curve, 
HT, pe, then becomes 8.9/0.1=89. If 
we accept this standard ratio, then 
we would require an r of more than 
.90 should the entropy of the item 
series fall below 8.9. If, for example, 
our item series had an JJ value of 17.8 
(which would be extremely high), 
an r of .80 would most likely be as 
strong an indication of scalability as 
the r of .90 in the first example, al- 
though a conservative experimenter 
would probably prefer to exceed the 
minimum values for both r and J// pg. 

A previously mentioned situation 
which frequently occurs in practice, 
for which the above considerations 


obtained in practice. 


RICHARD WILLIS 


are also in order, is that in which a 
subset of items which seems to form 
a scale is chosen out of the total 
number of items administered. 
Choosing such a subset imposes a 
greater structuring on the situation 
and produces a corresponding // drop. 
As in the case of combining catevor- 
ies, I1/ pe as well as r should be noted 
before drawing conclusions. Whether 
or not an experimenter chooses to 
use a standard J/ to pg ratio, that sug- 
gested above or any other, it would 
seem advisable in any case to report 
any manipulation, such as combining 
categories or picking out subsets of 
items, and the H drops 
along with the coefficients of repro- 
ducibility, as this information will be 
helpful in interpreting these r’s. 
When categories are combined or 
subsets of items are picked out ac- 
cording to the usual practice, there 


resulting 


are actually two distinct processes 
going on simultaneously. First, there 
is the increased structuring and the 
concomitant entropy. 
Second, there is the process of selec- 
tion carried on by the experimenter in 
which he capitalizes on the oppor- 
tunities which are presented by the 
data. The first process is an auto- 
matic result of decreasing the number 
of items or response categories and 
would obtain no matter whether the 
combining of categories or selection 
of item subsets is done selectively or 
entirely at random. It is the first 
process, the automatic /7 drop, which 
is adjusted for by the ///pg ratio, but 
there remains the problem of account- 
ing for the process of selection. 

It would seem that the only test 
of scalability which allows for this 
second factor is that of cross valida- 
tion on anew sample. And obviously, 
the greater the // drop, the greater 
the possible influence of selection, and 
therefore the more esssential it be- 


decrease in 
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comes to cross validate. Not only 
should the new coefficient of repro- 
ducibility and the new // drop, taken 
together, meet the minimum values, 
but it is also important that the items 
arrange themselves in the same order 
of difficulty or very nearly so. It is 
desirable that the most ad- 
vantageous way of combining re- 
sponse categories be similar for both 
samples. 

There is another situation which 
not occur often in practice, 
but which nevertheless is interesting 
from a theoretical standpoint. As- 
sume that we wish to interpret a co- 
efficient of reproducibility based on 
some other sample size than the 
usual 100. For some reason only 80 
respondents are available. Can we 
assume that 80 respondents will sup- 
ply us with 80 per cent as much in- 
formation as 100 respondents in the 
sense that the associated // values for 
the two testing situations would be in 
the ratio of 80 to 100? This assump- 
tion would not be justified. The proc- 
esses of combining categories and 
selecting a subset of items were both 
selective processes, and thus a defi- 
nite structuring cf the situation took 
place favoring a reduction of the pro- 
portion of nonfitting responses. But 
there is no such selection in the case 
above, and there is no reason to sup- 
pose that the proportion of errors will 
be changed.* Since the degree of 
structuring is not changed, there will 


also 


does 


* As R. L. Thorndike has pointed out to the 
author, in extreme reduction of the sample size 
there will be an appreciable reduction in the 
proportion of errors because of the fact that 
the responses among which we are looking for 
disagreement are also those which determine 
the scale values of the respondents. This 
spurious consistency is probably negligible, 
however, for values of N which are large in 
comparison to the number of possible scale 
values, which is equal to the number of items 
plus one. 
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be no change in the associated entropy 
value. 

Thus it seems reasonable 
sume that the mean of the distribu- 
tion function of r is not appreciably 
affected by moderate changes in N, 
although the standard error of r ob- 
viously will be. When comparing 
two r’s based on different sample 
sizes, we need not make any allow- 
ance for different entropy values; we 
need only keep in mind that the 
smaller the N, the larger the stand- 
ard error of r. As present theory does 
not enable us to write the expression 
for this standard error, we can be 
safe only by avoiding sample sizes 
below the accepted standard of 100. 

Table 1 gives the entropy values 
associated with dichotomous items 
for various differences between the 
response frequency = proportions, 


to as- 


TABLE 1 


Entropy VALUES ASSOCIATED WitH DicHorT- 
oMOUS ITEMS FOR VARIOUS It 
GREES OF RESPONSE BIAS 


Item Entropy 


1.0000 
99 IZ 
.9709 
9341 
8824 
8113 
7219 
6098 
4690 
2864 
.0000 


More extensive tables of values for 
— Zp logs: p have been published (8), 
or if necessary, a table of common 
logarithms may be used to compute 
logsp values by use of the following 
relationship: 


loge p=logio p/logio 2= 3.3219: logio p. 





. Festincer, L. 
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. Guttman, L. 
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NOTE ON SCORE TRANSFORMATION AND 
NONPARAMETRIC STATISTICS 
WALTER C. STANLEY 


Brown University 


It is becoming increasingly popu- 
lar to assess the statistical significance 
of markedly skewed data (e.g., la- 
tency of running or bar-pressing 
by means of non- 
Ihis is not sur- 


responses in rats 
parametric statistics. 
prising in view of the marked ease of 


calculation. However, one source of 


The smaller of the two sums of ranks 
of like signs is therefore 8, and the 
difference is not significant, a sum of 
4 being required for a p of .05. 

In the right half of Table 1, the 
same statistical test is carried out on 
the differences in the reciprocals of 
Again one rank is 


these scores. 


TABLE 1 


[LUUSTRATIVE Data ANALYZED IN Two Ways WiTtH WILCOXON’S PAIRED 
REPLICATES TEST 


Raw Scores 
Subj ct " 
Diff. 


Cond \ Cond B 


tN 


1 
2 
3 
i 
5 
5 
2 
7 


working directly with 
than with trans- 
may sometimes be a 
This could be the 
with ranks of 
differences paired 
Here some transformations 
would alter magnitudes of the 
differences and thus their ranks. 
Wilcoxon's (1) test for 
paired re pli ates applied to the raw 
Table 1. These scores could 
be latencies in seconds of a running 
response in rats obtained under two 
The largest 
difference is in the negative direction 


labor saving 
raw rather 
formed scores 


data 


mixed blessing. 
working 
between 


when 


case 
scores. 

score 
the 


Consider 


scores in 


conditions of extinction. 


Rank 


Reciprocals 


Cond. A Cond. B Diff. 
1.00 33 .67 
50 ae 38 
ae 14 19 
25 11 14 
20 17 03 
.04 .06 — .02 
.50 .20 .30 
14 07 .07 


assigned a negative sign, but here this 
rank is 1, and the p obtained is be- 
tween .02 and .01. 

Clearly some consideration must 
be given to the meaningfulness of 
scale units, and more generally, to 
the population of values to which 
one wishes to generalize in order to 
take full advantage of this “rapid 


approximate”’ statistical technique. 


1. WILCOXxON, Some rapid approximate 
statistical procedure: Stamford, Conn.: 
American Cyanamid Co., 1949 
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Oscoop, CHARLES E. Method and 
theory in experimental psychology. 
New York: Oxford Univer. Press, 
1953. Pp. vii +800. $10.00 ($14.00 
trade edition). 


According to the preface, this book 
was written “to provide undergradu- 
ate majors and graduate students in 
psychology with a text that evaluates 
experimental literature in close rela- 
tion to 
kor such has several virtues. 
Not only the theoretical 
about which the book is centered im- 
portant in their own right, but they 
also provide schemata that should 


critical theoretical issues.” 
use it 


are ISSUCS 


help the student to comprehend and 
retain the myriad facts of experi- 
mental research. Osgood's descrip- 
tion of many experiments is so mi- 
nute as to be the next best thing to 
Further- 
more, within its range, the book re- 


the originals themselves. 


flects with marvelous faithfulness the 
problems and atmosphere of experi- 
mental psychology as this is repre- 
sented in the latest volumes of Ameri- 
can psyche logic al journals, 

This timeliness is apparently not 
unusual recency in the cita- 
A sample of 100 was drawn 
1,290 the 
bibliography; when these were dis- 
tributed by publication date, their 
median date was found to be 1937. A 
similar sample of 100 from the ap- 
proximately 1,800 references in 
Woodworth’s /xperimental Psychol- 
ogy of 1938 (with which 
book is bound to be compared) 
vielded a median date of 1922. Thus, 
the median ages of the citations at 
the time of the publication of the 
two books were both 16 vears. (Is 
there a law here?) 


due to 
tions. 


from the references in 


Osgood's 


Woodworth’'s 
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references are, however, more skewed 
toward early dates than Osgood's. 

On the other hand, the book is 
beth massive and difhcult. Iwo er 
three pages on vacuum tube ampli- 
fiers and cathode ray oscillographs 
will be insufficient for the student who 
is untrained in physics and scarcely 
needed for one so trained. There is 
a bit of mathematics here and there, 
but probably not enough to frighten 
the bright and well-motivated stu- 
dents for whom the book must have 
been written. The first 300 pages pre- 
suppose rather extensive acquaint- 
ance with neural anatomy and neu- 
rophysiology; a few anatomical 
drawings, including maps of cortical 
cytoarchitecture, would have 
helpful here. 


been 


Then, too, for all of its 727 pages 
of text, the book is limited in scope. 
The author says that he has covered 
“..the major part of what is 
called experimental psychology, in- 
cluding sections on sensory processes, 
perception, learning, and symbolic 
processes.” In 1938, Woodworth 
had chapters on these topics plus 
three on feeling and emotion and one 
each on “experimental esthetics,” 
GSK, reaction time, attention, and 
reading. The lack of material on 
motivation and action, which are 
treated at least in part by such a 
recent book as Stevens’ J/andbook, as 
well as other topics that other readers 
will no doubt miss, will be felt keenly 
in any broadly conceived proseminar 
in which Osgood’s book is used. It is 
also likely to raise the tiresome ques- 
tion, what 1s experimental psychol- 


oe 
ogyv:! 


oe 


The word ‘‘method” appears in the 
title but not as a heading in the sub- 
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ject index, and it can be said that the 
text deals more with method in 
particular than with method in gener- 
al. One extensive evaluation and 
comparison of methods as such does 
appear, namely, that on psychophys- 
ics, but this is not wholly satisfactory. 
The method of average error is de- 
scribed inadequately and, the review- 
er believes, incorrectly. The “‘flicker- 
fusion method” is given treatment 
coordinate with that of the standard 
methods even though it must itself 
employ one of these latter methods. 
Further, it appears that the author is 
more worried than he need be over the 
interpretation of the results of the 
two-category method of constant 
stimull. 

To regard this work merely as a 
textbook or handbook would, how- 
ever, overlook its most significant as- 
pirations, which have to do with 
the evaluation of theory and the 
maintenance, in the author's words, of 
‘“‘a certain continuity of approach.” 
Now, it is notorious that experiment- 
al psychology does not easily submit 
to systematic unification, and we 
regretfully judge that this proposi- 
tion is not threatened by the present 
book. The theoretical atmosphere of 
the first six chapters, on sensory and 
perceptual processes, is noticeably 
different from that of the remaining 
ten, on perceptual dynamics, learn- 
ing, thinking, and language. Osgood’s 
Hullian views have no relevance to 
such matters as the cortical basis for 
sensory quality and intensity, the 
quantal hypothesis in audition, and 
the significance of visual adaptation. 
What little integration is achieved 
for sensory psychophysics and psy- 
chophysiology comes from the au- 
thor’s interest in the mode of action 
of the cerebral cortex, where he 
prefers neurostatistical conceptions 
to the “dynamics” of Kéhler. Hel- 
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son's conception of adaptation level, 
which might have been used to link 
together some problems that remain 
unconnected, is not mentioned. 

Beginning, however, with the chap- 
ter on central factors in perception 
and continuing through most of the 
remainder of the book, consideration 
is given to major theorists. Tolman, 
Guthrie, the gestalt psychologists, 
and Hull are discussed extensively, 
and briefer accounts are given of 
Hebb, Skinner, and others. It is ob- 
vious that, in most instances, Osgood 
has made intense efforts to under- 
stand and to present sympathetically 
opinions with which he partly or 
wholly disagrees, as well as to exhibit 
in full relief the weaknesses of those 
by which he takes his own stand. 
How successful he has been in the 
exposition of each of these views will 
certainly be judged differently by 
readers of different theoretical pre- 
dispositions. The reviewer's belief 
is that the systemsof Hull (up through 
Principles of Behavior—the later 
books were too recent to be included), 
Guthrie, and Tolman are discussed in 
such a way as to give their followers 
little cause for complaint; indeed, 
Osgood has constructed challenging 
formalizations of the last two, with 
the aid of Voeks in Guthrie's case. 
Gestalt theory is more alien to him, 
and he makes the mistake of writing 
as if Kohler, Wertheimer, Koffka, 
Lewin, and J. F. Brown adhered to 
one system in common. 

Three pages are allotted to Hebb 
as an exponent of physiological the- 
ory. Unfortunately, the more purely 
psychological aspects of Hebb’s ideas 
are lost sight of, so that, for exam- 
ple, his contributions to the the- 
ory of thinking are not referred to 
in Osgood’s chapters on this topic. 
Least satisfactory is the treatment 
accorded to Skinner. His views on 
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the role of theory in psychology are 
omitted, and some of the remarkable 
regularities that he has established, 
which are clearly of concern to a 
learning theorist, are given little or no 
attention. This is true, for example, 
of the effects of different reinforce- 
ment schedules. Periodic reinforce- 
ment is mentioned in passing, with 
the comment that ““The mechanism 
whereby an animal smoothly adjusts 
its rate of response under these condi- 
tions is unknown.”’ ‘Intermittent 
reinforcement” is discussed more 
thoroughly, but Skinner's name is not 
mentioned in connection with it. The 
same is true of the problem of the 
number of kinds of learning. <A 
rather misleading statement (p. 307) 
could cause the reader to suppose 
that Skinner accepts the concept of 
disinhibition. 

Osgood’s own “‘mediation hypothe- 
by which he hopes to effect a 
rapprochement between ‘“‘reinforce- 
ment’’ and ‘‘cognitive’ theories, is 
not a hypothesis in the ordinary 
sense, but rather a broad extension of 
Hull’s concepts of anticipatory goal 
responses and their associated pro- 
The following 


° ” 
SIS, 


prioceptive stimull. 
statement from the chapter on lan- 
guage behavior exemplifies one of the 


functions attributed to mediators, 
that of serving as signs: 

a pattern of stimulation which is not 
the object is a sign of the object if it evokes 
in the organism a mediating reaction, this (a) 
being some fractional part of the total be- 
havior elicited by the object and (b) producing 
distinctive self-stimulation that mediates re- 
sponses which would not occur without the 
previous association of nonobject and object 
patterns of stimulation (p. 696). 


Such mediated self-stimulation has 
cue, motivating, and reinforcing 
properties. Mediators are not neces- 
sarily peripheral but may be purely 
cortical events; they are said to have 
the of “hypothetical con- 
structs."” The following comments 


status 
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are made by the reviewer, with the 
caveat that they should be taken as 
no more than fragments of a possible 
critical analysis of the hypothesis. 
(a2) The mediators are so substitut- 
able for the ‘‘ideas’’ of association 
theory that occasionally the discus- 
sion sounds like a mere translation of 
such theory into Hullian language. It 
would be no defense to attack cogni- 
tive theory on the same grounds. (0d) 
While the evidence submitted in fa- 
vor of the hypothesis seems to indi- 
cate that some sort of mediation is 
involved in the behaviors that are de- 
scribed, it does not, in the reviewer's 
opinion, discriminate clearly in favor 
of fractional goal responses. (c) The 
explanations by mediators have an 
essentially chainlike character, and 
are open to all of the objections that 
have been raised against such a con- 
ception since Dewey’s paper on the 
reflex arc. (d) The attribution to me- 
diators of both motivating and rein- 
forcing properties leads to the same 
extreme difficulties that leave the 
stimulus-reduction theory of rein- 
forcement hanging by a thread on 
page 443. (e) One would like more 
evidence that the hypothesis can pre- 
dict new and unexpected phenomena. 

On questions of philosophy of sci- 
ence and theory of knowledge the 
book is disconcertingly artless, and 
the thought is more energetic than 
subtle. Although Osgood calls him- 
self a materialist, and says that ‘‘Be- 
haviorally, ... the environment is a 
pattern of neural energies in the cen- 
tral nervous system,”’ and that, when 
one touches something, ‘““The aware- 
ness of sensation is not...in the 
fingertips ... but in the brain,’ he 
nevertheless wrestles with the ghosts 
of introspectionism without van- 
quishing them, since they remain to 
haunt many pages of the book. At the 
same time, he shows no great sensi- 
tivity to the problems of earlier days; 
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e.g., almost anything that one might 
say about oneself is called ‘‘introspec- 
tion.’” For examples from other con- 
texts, Lloyd Morgan's canon is 
treated as if it were subject to experi- 
mental test; and the discussion of 
repression leads to the methodologi- 
cally awkward conclusion that moti- 
vated forgetting “... is certainly a 
valid observation in the clinic and 
probably would be verified in the 
laboratory. ...”’ 

So ambitious a work can be ex- 
pected to contain some inaccurate 
and debatable matter. Thus, the all- 
or-none law is so stated as to make it 
depend upon the full utilization of 
“‘materials’” in the neuron; in con- 
nection with the Talbot-Plateau law, 
it is asserted inaccurately that 
” . with a light-dark ratio of one 
to one, the fused field will appear one- 
half as bright as the illuminated sec- 
tor’; the color solid, which some 
might regard as a minor triumph of 
taxonomy, is dismissed as a mere 
pedagogical device, apparently on the 
incorrect supposition that it was in- 
tended to represent the facts of color 
mixture; it is said that interference 
theories of extinction never specify 
the interfering responses, whereas a 
number of such responses were identi- 
fied two pages earlier in the descrip- 
tion of research by Wendt; a refer- 
ence to work by Birenbaum and Zei- 
garnik leaves the impression that 
Lewinian theory presumes that 
boundaries between regions within 
the person are less permeable in the 
child than in the adult, when in fact 
the theory asserts the reverse. 

The book is seriously marred by 
numerous errors of grammar, spelling, 
and typography, as well as other 
blemishes of which some are matters 
of taste. It is regrettable that most 
of these were not weeded out by the 
publisher's editing. A few miscel- 
laneous examples follow. 
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The names Dashiell, Dimmick, and Bogus- 
lavsky are misspelled; Prignanz is spelled 
pregninz; the plural of plexus is given as 
plexi; eg. is almost uniformly given the 
meaning that ts, and similar confusions occur 
in the use of 1.e., cf., and viz.; to refute is taken 
to mean no more than to contest; etc. Of 
typographical errors, perhaps the most 
troublesome is the printing of a negative ex- 
ponent in an exponential expression as a sub- 
traction (p. 357). The table titles and figure 
captions are generally too sparing, and the 
text does not always refer fully, or even ac- 
curately, to the figures. Thus, the text refers 
to dashed and solid lines in Fig. 123 but only 
the latter are found; Fig. 5 is too sketchy 
for the text; hair cells are mentioned in the 
text but not identified in Fig. 5 and Fig. 24; 
the abscissa of Fig. 63 should be labeled 
millimicrons; more labels are needed for Fig. 
207; the text refers to Fig. 177 when Fig. 176 
is intended. (On the other hand, many of the 
figures are excellent, as, e.g., Fig. 89 and 
101.) It may also be mentioned that the 
type seems very small for the length of line 
on the large single-column page. 


The foregoing, however distress- 
ing, are minutiae, while Osgood's 
accomplishments are not. Scattered 
throughout the book are original 
theoretical and experimental contri- 
butions on topics ranging from color 
contrast to transfer and retroaction in 
learning. Several chapters are unusu- 
ally interesting, e.g., Chapter 7 on 
perception and Chapter 14 on prob- 
lem solving. Furthermore, the author 
is at his best in assembling the evi- 
dence for and against testable research 
hypotheses and doggedly following 
the trail of the experiments even when 
they lead in an undesired direction; 
and this is, after all, his principal 
intention. 

FRANCIS W. IRWIN. 

University of Pennsylvania. 


Havicuurst, Ropert, J., & AL- 
BRECHT, Rutu. Older people. New 
York: Longmans, Green, 1953. 
Pp. xvi+415. $5.00. 


This book is an admirable antidote 
to the prophets of doom and gloom 


who identify aging with senility, disa- 
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bility, and frustration. It is based on 
a detailed survey of a small midwest- 
ern city (7,000 in the town itself and 
4,000 in the adjacent trading area) 
and presents factual data about what 
the older people there are like (670 of 
them 65 or older). Individual inter- 
views were carried out with a sample 
of 100 of the old people drawn to 
represent the entire old age popula- 
tion by sex, sccioeconomic status, and 
marital status, 

The book, however, is more than a 
compilation of factual information. 
It adds a conceptual framework for 
the analysis of the needs and desires 
of individual aging people, an analy- 
sis of the community's reaction to the 
aged, and a formulation of basic 
principles essential to the effective 
planning of programs for the aged. 
Writing in an easy, flowing style, 
with enough ancecdotal material to 
maintain a lively interest, the authors 
manage to impart a great deal of fac- 
tual information without the reader's 
being aware of it. 

The book should be general in its 
appeal, not only to those who are 
especially concerned with the prob- 
lems of aging, but also to the aging 
people themselves. Thus, the physi- 
cian who may be exposed daily to 


the aches and pains of older people 
will be agreeably surprised to learn 
that 79 per cent of the oldsters in this 
cultural environment regarded them- 
selves as “healthy’’ and that only 6 
per cent were homebound and 2 per 


cent actually bedridden. The social 
worker, who must deal with the prob- 
lem of finances, living arrangements, 
etc. among the aged, ought to know 
that 43 per cent of this population 
reported they were happily situated 
and that only one-fourth were actu- 
ally unhappy. It is also important to 
know that happiness was not related 
to economic status and that health 
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became a major component of happi- 
ness only in the 20 per cent of the 
population who had specific com- 
plaints; i.e., old age was what the in- 
dividual made it, with economic and 
health aspects assuming a secondary 
role. All but 4 per cent of the popula- 
tion reported that they had enough 
money to get along on provided they 
had no unusual medical expenses. 
Anxiety about what they would do in 
case of protracted illness requiring 
hospitalization was present in a large 
proportion of the elderly people. 
There was no evidence that the com- 
munity rejected old people because of 
their age. However, the importance 
of the role of the individual in the 
community was emphasized. In gen- 
eral, the community seemed quite per- 
missive about what it expects of older 
people and looked with favor on con- 
tinued activity. 

The authors present a number of 
concepts that may require further 
analysis. For example, their outline 
of the meaning of work shows the 
need for a variety of retirement plans 
if we are to meet the needs both of 
the individual and of society. The 
concept of socioeconomic mobility is 
applied to the psychological adjust- 
ments of aging people in an inter- 
esting manner. Whether all of the 
conceptual formulations will prove 
useful remains to be seen, but the au- 
thors are to be complimented on their 
willingness to theorize and thus re- 
duce a mass of specific observations to 
some intelligible formulation. 

Psychologists with a quantitative 
psychometric viewpoint will be dis- 
appointed by the use of nonscaled 
questionnaire techniques, and sociolo- 
gists may be disturbed by the inclu- 
sion of a chapter on ‘“‘A Personal and 
Social Philosophy of Old Age”’ which 
makes specific recommendations 
about “rational defenses” that can be 
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used by older people. One might also 
quibble about whether the popula- 
tion of Prairie City is representative 
of the entire United States. How- 
ever, the solid facts of how one group 
of aging people behaved in a definable 
environment have been admirably 
explored by this study. Only by simi- 
lar studies in different community 
environments can the generality of 
the findings be settled. 

It is good to know that in some 
community structures a large propor- 
tion of the older people can optimisti- 
cally meet the problems of aging. 
This is a book that should be read by 
everyone interested in the welfare of 
his own community. 

N. W. SHock. 

Baltimore City Hospitals. 


Fh EDERN, PauL. Ego psychology and 
the psychoses. (E. Weiss, Ed.) New 
York: Basic Books, 1953. Pp. 375. 
$6.00. 


The editor had the difficult task of 
organizing the presentation of a sys- 
tematic theory of ego functioning out 
of another man's half-century (1901- 
1952) of prolific publishing (almost 
100 articles, lectures, etc.). The 40- 
year association as pupil and friend 
of Paul Federn provided important 
background for this task. In the 21- 
page introduction, Weiss attempts to 
establish Paul Federn’s orthodox 
lovalty to Dr. Freud (to take care of 
the ‘“‘minor” differences in the area of 
ego theory), to apologize for Federn's 
difficulties in exposition which is “ 
content but... often 
(p. 21), are! to present 
aide to the 


very rich in 
complex etc.” 
a condensed, clarifving 
metapsychological wilderness of the 
text. He apparently did not feel that 
Federn could provide in 16 articles 
and 340 pages a sufficiently clear 
presentation of his thoughts, but un- 
fortunately the guide itself needs a 
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guide. The first five ‘theoretical 
papers” are a maze of unclear verbal 
gymnastics, poorly defined concepts, 
and repeated — self-adulation by 
Federn which does not sufficiently re- 
ward the reader for his difficult labor 
of seeking meaning. In this part of 
the book, Federn shares a difficulty 
common to many psychoanalytic au- 
thors who consider operationalism a 
primitive procedure and 
prefer an artistic verbal medium. 
The second part of the book (nine 
papers) is a fuller measure of Dr. 
Federn’s contribution. Here, his rich 
clinical experience and observational 


scientific 


astuteness appear to good advantage. 
A conception of the ego differing in 
many respects from that postulated 
by Freud is presented. ‘‘E-go Feeling” 
seems to be viewed as a basic energy 
which integrates ego functioning and 
makes ego experience possible. An 
interesting hypothesis suggesting a 
somatic and mental differentiation of 
ego feeling is developed. There is a 
much-repeated empiasis on psychosis 
as an ego-defeat phenomenon, as 
against the neurotic dynamics of ego 
defense The “Steinach 
effect"’ is defended as a_ sufficient 
basis for the latent 
psychotics and psychotics. However, 
the psychological rationale given in 


so-called 


sterilization of 


Federn’s papers hardly constitutes a 
convincing argument. 

This book is not the ‘‘Vade Medi- 
cum” for the treatment of psychosis 
which the publishers enthusiastically 


proclaim on the book-jacket blurb. It 


recommended for 
However, if 


is certainly not 
general reading. 
reader is familiar with psychoanalytic 


the 


terminology and has the perseverance 
to follow Federn through “thick and 
thick,”’ then he may find in the papers 
many provocative speculations based 
on extensive clinical experience. 
There might even be the reward of 
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deriving fruitful hypotheses for ex- 
perimental investigation from some of 
these speculations. 
M. Erik WRIGHT. 
University of Kansas. 


McCurpy, Haro_tp Grier. The 
personality of Shakespeare. New 
Haven: Yale Univer. Press, 1953. 
Pp. xi+243. $5.00. 

Almost twenty years ago Caroline 
Spurgeon published the results of her 
extensive study of Shakespeare's 
which she 
drew a number of inferences concern- 
ing Shakespeare's interests, preoccu- 
pations, attachments, personality 
traits, personality development, and 
the like.! Her study was based upon 
the premise that a person reveals 
things about his personality by the 
kinds of metaphors he uses. 

McCurdy has tried to get at the 
personality of Shakespeare by a dif- 
ferent route. He has counted the 
number of lines that every character 
in every play speaks in order to de- 
termine the principal characters in 
each play. The assumption is made 
that the more important features of 
Shakespeare's personality are repre- 
sented in the principal characters. In 
addition to character analysis, Mc- 
Curdy has analyzed some of the ma- 
jor themes that run through a num- 
ber of plays much as one would 
analyze a TAT protocol. 

For anyone who admires Shake- 
speare and who feels that the psy- 
chologist has something to offer in the 
way of method for shedding more 
light upon the kind of person that 
Shakespeare must have been in order 
to write as he did, this is a fascinating 
study. Dr. McCurdy is dedicated to 
his subject and writes with sensi- 
tivity and critical devotion. 

Most psychologists do not quarrel 


figures of speech from 


! Shakespeare's imagery and what it tells us 
New York: Macmillan, 1935. 


with the hypothesis that a writer 
projects himself into his writings, 
and that it should he possible there- 
fore to make a personality analysis of 
a writer from his writings alone. The 
same hypothesis probably holds for 
painters, composers, architects, de- 
signers, and anyone who produces 
something out of his imagination. 
However, Shakespeare is the worst 
possible subject on whom to test the 
hypothesis. Virtually nothing is 
known about the man. In fact, so 
little is known that a number of 
people believe that someone else 
must have written the plays at- 
tributed to Shakespeare. Such being 
the case, how can one ever hope to 
confirm or infirm the inferences that 
are made about the personality of 
Shakespeare from his writings? 

I wish that Dr. McCurdy had de- 
voted his considerable talents both as 
a psychologist and as a literary critic 
to the examination of a writer about 
whom a great deal is known. Hem- 
ingway or Mickey Spillane would 
make excellent choices. Is it true, for 
instance, that Mike Hammer repre- 
sents the shadow-side of Spillane’s 
personality? How much of the old 
man of the sea is to be found in 
Hemingway? McCurdy’'s methods of 
of analysis would undoubtedly strike 
pay dirt and reveal a great deal about 
the nature of projection if he used 
them on more appropriate subjects. 

CALVIN S. HALL. 

Western Reserve University. 


STOLUROW, LAWRENCE M. (Ed.) 
Readings in learning. New York: 
Prentice-Hall, 1953. Pp. viii+555. 
$6.00. 

Stolurow has collected 42 biblio- 
graphical items from the field of 
learning, either in whole or in part, 
and by combining three into one unit 
and two into another, has presented 
39 articles under one cover. His pur- 
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pose was to provide a sample of origi- 
nal material to supplement secondary 
source materials since ‘‘only by read- 
ing and analysing original reports can 
the student learn how to conduct a 
variety of different types of research 
and at the same time become aware of 
the problems and difficulties in- 
volved.”’ Presumably this book was 
intended to serve as a second text in 
a course in theories of learning or per- 
haps as a source book in a course in 
which no text was used, although 
Stolurow suggests that, ‘“‘Where a 
laboratory is available, this volume 
could serve as a laboratory text. The 
experimental studies could be used as 
models for both experiments and 
report writing, and the theoretical 
articles as bases for new studies.” 

It is a fairly safe estimate that at 
least 5,000 bibliographical items have 
been published in the past 70 years 
from which Stolurow chose 42. His 
selections could well be analyzed in 
terms of whether they are essentially 
systematic as opposed to experimen- 
tal articles, in terms of how well they 
represent the various problem areas 
that have been of concern to people 
interested in learning, in terms of how 
well various theoretical points of view 
are represented, and in terms of 
whether these particular items are in 
fact good models. 

The book is divided into eight sec- 
tions or chapters. The first of these, 
titled ‘‘Some Systematic Positions,” 
contains six units representing eight 
of the 42 articles selected. These 
eight and seven others in later chap- 
ters, something over 35 per cent of the 
selections, are essentially nonexperi- 
mental. In addition, “‘Every chapter 
and article contains some theory. 
This is a sign of the times.’’ Thus the 
book is very heavily weighted in favor 
of theory and its exposition, although 
it does contain 27 detailed research 
reports. 
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Stolurow’s selections are not repre- 
sentative of points of view. He re- 
ports that he based his selections on 
five points of quality and the addi- 
tional criteria that selections should 
be recent rather than old, human 
rather than animal, and S-R rather 
than gestalt. His chapter on system- 
atic positions has selections from 
Thorndike, Hull, Guthrie, Estes, and 
Skinner, representing S-R positions, 
and Tolman, who is somewhat diffi- 
cult to classify. The other systematic 
or nenexperimental articles are by 
Hull, Mowrer, Pavlov, Skinner, 
Spence, Tolman, and Woodworth. 

Aside from the S-R bias, which is 
confessed and deliberate, the nature 
of the selection process is partially 
revealed by who is omitted. There 
are, by rough count, 947 references 
cited in this list of 39 items. Thirteen 
men have ten or more references on 
this list. Stolurow has selected arti- 
cles from eight of these men, but has 
not selected any from the writings of 
Hilgard, Krech (Krechevsky. in the 
bibliography), McGeoch, Maier, or 
Neal Miller. Each has published at 
least one article in the 
period from which these items were 
selected. Twenty-two were published 
between 1947 and 1950 and the re- 
maining 20 between 1928 and 1946. 

The 27 articles which report experi- 
ments in detail are heavily weigated 
in favor of experiments involving hu- 
man subjects (18) as against those 
involving animals (9). This weight- 
ing follows largely from the choices 
of subject matter represented. There 
are two experimental articles on con- 
ditioning concepts and _ techniques, 
four on motivation and _ reinforce- 
ment, six on motor and verbal learn- 
ing, three on discrimination and 
perceptual learning, two on educa- 
tional and social learning, five on re- 
tention and forgetting, and five on 
transfer. 


oe ° ” 
classic 
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As models of research design, there 
can be little quarrel with Stolurow's 
selections. They are some of the 
better examples of the kind of article 
usually found in the Psychological 
Review, the Journal of Experimental 
Psychology, and the Journal of Com- 
parative and Phystological Psychology, 
from which most of his selections 
were taken. They are also typical of 
the writing style of these journals 
and are good models if one wishes to 
perpetuate this style. If one wishes 
to teach a more intelligible style, 
some of these selections can be used 
as examples of what not to do. 

The book is printed by a photo- 
The right-hand mar- 
gins of the pages are not justified. 
The paper used is sufficiently trans- 
parent that printing from the back 
of the page and from the next page 
shows through. The binding is a hard 
cover, but it is covered with paper 
rather than cloth and the spine of my 
copy is broken on both sides from the 
handling it has received in preparing 
this review. It is a shabby job of 
book making and, from this stand- 
point, overpriced by a very wide 
margin. 


offset process. 


EDWARD L. WALKER. 
University of Michigan. 


PSYCHOTHERAPY RESEARCH GROUP, 
PENNSYLVANIA STATE COLLEGE, 
(Wm. U. Snyder, Chairman). 
Group report of a program of re- 
search in psychotherapy. State Col- 
lege: Pennsylvania State Coll. 
Press, 1953. Pp. iii+179. $2.25. 


Regardless of divergent viewpoints 


about the merits of client-centered 
counseling, much credit is due Rogers 
and his students for their pioneering 
and ingenious research with  ver- 
batim interview recordings. This 
report is part of a sequence which 
began little more than a decade ago, 
but it already represents a second 
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generation effort, being the col- 
laborative production of nine doc- 
toral students directed by William 
U. Snyder. The report consists of a 
description of the research program, 
nine thesis condensations, a summary 
and discussion by Dr. Snyder, a bibli- 
ography, and a detailed appendix 
that provides the rating and coding 
procedures utilized in the studies. 

The basic sample of cases studied 
was 100 student counseleesfrom which 
maximum N samples were drawn 
depending upon various selection cri- 
teria, e.g., number of interviews, 
transcribability of interviews, tests 
taken, etc. In addition to analyses 
based upon the recorded interviews, 
use was made of pre- and postcounsel- 
ing tests. These tests were the Ror- 
schach, the MMPI, and the Mooney 
Problem Check List. Further data 
were provided by ratings made by 
clients, counselors, and independent 
judges using such devices as a post- 
counseling client scale, a therapist 
personality scale, and a posttherapy 
counselor check list. Somewhat sur- 
prisingly in view of recent trends, no 
Q sorts were employed. 

Among the topics investigated were 
the reasons for early dropouts, the 
relationships between counselor and 
client characteristics, the develop- 
ment of a composite criterion for 
measuring client progress, the pre- 
dictability of client verbal behavior 
during counseling, indices of resist- 
ance, and comparison of the charac- 
teristics of more and less successful 
cases. 

A number of new hypotheses and 
variables have been introduced in 
these studies and, despite the fact 
that most of the findings are incon- 
clusive, the report is an important 
contribution to research methodol- 
ogy. Many readers will doubtless be 
disturbed by the continuing defense 
offered for the position that the effec- 
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tiveness of therapy can be evaluated 
without the necessity of external cri- 
teria. 
LEONARD S. KOGAN. 
Institute of Welfare Research, 
Community Service Society of 
New York. 


STEPHENSON, W. The study of behav- 
tor. Chicago: Univer. of Chicago 
Press, 1953. Pp. ix+376. $7.50. 


If Stephenson had set forth his 
purposes in writing this hook, which 
carries the subtitle “Q-technique and 
its methodology,” the reviewer's task 
would be easier. Our inference is 
that his aims were the following: (a) 
to challenge much of current method- 
ology in psychology, (b) to explain 
()-methodology and (c) to show by 
illustration how it can put psychol- 
ogy’s “house in scientific order,"’ and 
(d) to demonstrate that theory test- 
ing and scientific conclusions are pos- 
sible on the basis of a single case. 


Perhaps when the author speaks of 
the “platform upon which we are to 
campaign,”’ he is telling us that his 
sole aim is to promote Q-methodol- 
ogy. 

We shall not attempt to list here 
all of the concepts and all of the 


people against which and whom 
Stephenson arrays himself for battle. 
He does not have any faith in ordi- 
nary factor analysis (R-methodol- 
ogy), in measurement, in norms, in 
large samples, or in any so-called 
generalizations springing therefrom. 
iie admits that he alone is “‘in step 
and all others out”’ (p. 348), but this 
does not keep him from citing what- 
ever supporting fragments he can 
find, whether these be found in the 
writings of J. R. Kantor or of J. M. 
Keynes or of some very obscure per- 
son. His sallies, courageously set 
forth, will be found either interesting 
or irritating, according to the pro- 
clivity of the reader 
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With regard to purpose }, one would 
expect that an author who complains 
because such intellects as Godfrey 
Thomson and Cyril Burt have mis- 
understood his writing would make a 
special effort at clear and concise 
exposition. Instead, we find a poorly 
organized, piecemeal presentation, 
more confusing than enlightening. 
Thus, we can anticipate continuing 
misunderstanding, and consequent 
misuse of Q-methodology. 

The second half of the 
devoted mainly to applications of Q- 
methodology in the areas of type psy- 
chology, questionnaire analysis, so- 
cial psychology, self-psychology, per- 
sonality, projective tests, and clinical 
psychology (a chapter to each area). 
We are told that “Q-technique has 
its applications in almost every nook 
and cranny of psychology in its re- 
search aspects”’ (p. 338) and “‘in every 
branch of psychology where behavior 
is at (p. 343). If we are to 
judge from the given illustrative ap- 
plications, the quoted claims repre- 
sent wishful thinking. 

Clinicians and others who, by 
necessity or by choice, deal with a 
single case will find comforting re- 
assurance and he motivated to read 
further by ‘‘We are to work, instead, 
with a single person, at the call of a 
theory. Yet we shall reach valid, 
scientific conclusions” (p. 5). The 
continuous stress on the merits of 
the single case leads ultimately to “In 
principle, one may work scientifically 
for a lifetime with a single case’ (p. 
343). Unfortunately, by the time 
one has spent a lifetime developing a 
set of principles for predicting (or 
explaining) every fragment of he- 
havior of a single case, the subject 
will have behave. Or 
another logical conclusion to this sort 
of thing is that psychologists must 
develop two and a half billion “‘sci- 


ences’ to explain the behavior of the 


book is 


4 ? 
issue 


ceased to 
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two and a half billion human inhabi- 
tants on this planet! What of the 
task of animal psychologists? 

The fact that our comments have 
been restricted to general reactions 
should not be misconstrued as indi- 
cating that the margins of the re- 
viewer's copy of the book are free of 
specific questions. Far from it. 

QUINN MCNEMAR. 

Stanford University. 


(,OLDHAMER, HERBERT, & MARSHALL 
ANDREW W. Psychosis and civiliza- 
tion. Glencoe, Ill.: The Free Press, 
1953. Pp. 126. $4.00. 


This slim volume consists of a re- 
print of two statistical studies in 
the frequency of mental disease. The 
first and more significant investiga- 
tion is concerned with an analysis of 
admissions to mental hospitals in 
Massachusetts and Oneida County, 
New York, extending back to 1840. 
Consistent with data previously re- 
ported by others for the past half 
century, the present authors found 
that for age groups under 50 there 
has been no increase in the frequency 
of psychoses over the past 100 years. 
This finding should put an end to 
the recurring myth that psychoses 
are a product of the stress and strain 
of modern life. The second paper 
presents expectancy rates of mental 
disease. It differs from earlier studies 
in that the tables prepared state the 
risk of admission to a mental hospital 
between any two points of an indi- 
vidual's life. This method of presen- 
tation serves to accentuate the high 
incidence of admissions for the older 
age group. If a 60-year-old male sur- 
vives to age 85 he runs a 10 per cent 
risk of being admitted to a mental 
hospital. 

James D. PaGe. 

Temple University. 
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FRENCH, THOMAS M. The integration 
of behavior. Vol. 11. The integrative 
process in dreams. Chicago: Univer. 
of Chicago Press, 1954. Pp. xi 
+367. $6.50. 

When the history of twentieth cen- 
tury psychology is written, it will un- 
doubtedly be characterized as the 
period when two great streams of 
psychological thought, one flowing 
from the laboratory, the other from 
the clinic, converged and ran to- 
gether to form a unified science of 
dynamic psychology. The future 
historian will observe that the dialec- 
tical process which eventuated in the 
synthesis of experimental and clinical 
psychology took a long time and had 
to overcome many obstacles. Among 
the obstacles that he will discuss, one 
at least is apparent to us today, 
namely the insularity of the pro- 
ponents of each of these major orien- 
tations. This insularity prevents one 
side from communicating with the 
other, so that each remains ignorant 
of what the other stands for. 

Fortunately for psychology there 
are indications that the iron curtain 
of insularity is being lifted and that 
a real exchange and integration of 
ideas are beginning to take place. 
The name-calling era is drawing to 
a close. Psychologists brought up in 
the tradition of experimental psy- 
chology are reading and being insemi- 
nated by psychoanalysis, and psycho- 
analysts, though to a lesser extent, 
are reading and being inseminated 
by experimental psychology. An out- 
standing example of a psychoanalyti- 
cally trained investigator whose 
thinking has been fertilized by inter- 
course with systematic experimental 
psychology is Thomas French, asso- 
ciate director of the Chicago Institute 
of Psychoanalysis. 

In a number of articles published 
during the last 20 years, French has 
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demonstrated his ability to synthe- 
size the two orientations. Now, in 
his impressive work in progress, three 
volumes of which are still to be pub- 
lished, we are privileged to witness 
the culmination of his integrative en- 
deavors. The present volume is an 
application of the basic postulates 
set forth in Volume I to an under- 
standing of dreams. 

Essentially, what French has done 
is to graft Tolman’s cognitive theory 
onto Freud’s motivational theory. 
French’s key concept is cognitive 
structure, by which he means ‘a 
hierarchy of plans for achieving an 
end-goal.”” Implicit in this definition 
is the concept of motive since every 
plan involves striving toward a goal. 

French not only believes that every 
dream has a cognitive structure but 
also that the cognitive structures of 
different dreams of the same person 
form a pattern. The cognitive struc- 


ture of a dream is discovered by using 
various sources of evidence, namely, 
information about the dreamer, free 
associations, translation of symbols 
by functional analysis, and compari- 


sons of one dream with the other 
dreams of a series. All of this infor- 
mation is blended together by em- 
ploying the method of internal con- 
sistency, which is the favorite method 
of psychoanalytic investigators. The 
end result is a comprehensive under- 
standing of the dreamer’s conflicts, 
their relative intensities and inter- 
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relationships, their historical origins 
and contemporary significance, their 
underlying physiological and motiva- 
tional patterns, the dreamer’s plans 
and capacities for resolving them, 
and his hopes of success. 

The rhetorical strategy of the book 
consists of discussing the dreams of 
a patient who is undergoing the type 
of psychoanalytic treatment prac- 
ticed at the Chicago Institute of Psy- 
choanalysis. Some readers may ques- 
tion whether the strategy is a success- 
ful one. This reader found that it be- 
came quite tedious to try to follow 
and bear in mind all of the intricacies 
of analyzing the dream series. It 
may well be that every idea is related 
in some way to every other idea, and 
that there is almost an infinity of 
components and levels in the cogni- 
tive structure of a particular human 
mind, but I wonder whether there is 
not a better way to get this acrcess. 
Perhaps not. It seemed to me that 
there was an unconscionable amount 
of redundancy and that more severe 
editing could have made the book 
more readable. French's use of dia- 
grams is an aid to quick understand- 
ing and should have enabled him to 
abbreviate the extended discussions. 

In spite of these literary defects 
the book is a solid contribution to the 
major task of twentieth century psy- 
chology. 

CALVIN S. HALL. 

Western Reserve University. 
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